SNMP与普罗米修斯
两者之间的通信格式和协议都不相同,故不能直接通信,但是我们可以使用snmp_exporter。snmp-exporter是prometheus官方开源的一款网络设备监控工具,从SNMP收集的信息,供Prometheus监控系统使用。它有两个部分。一个执行实际抓取的snmp-exporter和一个generator(它依赖于NetSNMP)创建供导出器使用的配置。
相关信息
1.prometheus: 监控系统,负责指标的收取和一些告警规则的定义
2.snmp-exporter: 用于通过snmp协议暴露交换机的相关指标
3.SNMP Exporter Config Generator: 此配置生成器使用NetSNMP解析MIB,并使用它们为snmp_exporter生成配置–帮助生成snmp的配置文件
4.MIB和OID: MIB是管理信息库的缩写,它是用于管理通信网络中的实体的数据库。数据库是分层的(树形结构),并且每个条目都通过对象标识符(OID)进行寻址
5.snmp协议: SNMP 是专门设计用于在 IP 网络管理网络节点(服务器、工作站、路由器、交换机及HUBS等)的一种标准协议
数据流流向:
服务器(SNMP Agent) -> snmp_exporter -> Prometheus
测试环境:
snmp监控对象准备,以centos7环境的snmp服务演示,监控系统指标
获取MIB:
找厂商提供,也可以在下面的github里面找,有很多基础的MIB
https://github.com/librenms/librenms/tree/master/mibs
一、SNMP服务配置
- 安装snmp服务采集端
yum install -y net-snmp-utils net-snmp net-snmp-devel net-snmp-libs net-snmp-perl mrtg systemctl enable snmpd
systemctl start snmpd
2、配置/etc/snmp/snmpd.conf

systemctl restart snmpd
3、snmp协议测试
本地测试:snmpwalk -v 2c -cpublic localhost 1.3.6.1.2.1.25.3.3.1.1
远程测试:snmpwalk -v 2c -c public 172.16.0.112 1.3.6.1.2.1.25.3.3.1.1

其中SNMPv2-SMI是/usr/share/snmp/mibs目录下的SNMPv2-SMI.txt文件
snmptranslate -Tz -m ./SNMPv2-SMI.txt

二、安装snmp_exporter服务端
snmp_exporter一个服务可以管理上千台snmp服务,可以和snmp服务端不在同一台机器上
1、安装snmp_exporter服务
下载snmp-prometheus部署包
tar -zxvf snmp_exporter-0.20.0.linux-amd64.tar.gz
mv snmp_exporter-0.20.0.linux-amd64 /usr/local/snmp_exporter
添加启动服务
vim /usr/lib/systemd/system/snmp_exporter.service
[Unit] Description=SNMP Exporter After=network-online.target # This assumes you are running snmp_exporter under the user "prometheus" [Service] Type=simple Restart=on-failure ExecStart=/usr/local/snmp_exporter/snmp_exporter --config.file=/usr/local/snmp_exporter/snmp.yml RestartSec=5 [Install] WantedBy=multi-user.target |
systemctl daemon-reload
systemctl start snmp_exporter
systemctl enable snmp_exporter
三、自定义snmp指标采集配置generator生成器
官方提供的snmp.yaml默认配置以及涵盖大多数用例,如果需要从 MIB 生成自己的配置或自定义遍历哪些对象、使用非公共 MIB,可以使用generator生成器。
上边在配置 SNMP Exporter 的配置文件的时候,使用了官方提供的默认的 snmp.yml ,有些设备官方的配置里并没有支持,这个时候就需要通过 MIB 文件来自己生成了
1、安装环境依赖
安装依赖:
yum -y install gcc gcc-g++ make net-snmp net-snmp-utils net-snmp-libs net-snmp-devel git
2、安装go环境
方式一:yum安装go环境
yum install -y epel-release
yum install -y golang
[root@docker02 generator]# go version
go version go1.16.13 linux/amd64
方式二:本地系统上完成snmp_exporter的配置文件
源码安装go环境
wget https://golang.google.cn/dl/go1.15.4.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.15.4.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
echo "export PATH=$PATH:/usr/local/go/bin" >> /etc/profile
go version
go env -w GOPROXY=https://goproxy.io,direct
3、配置generator
git clone https://github.com/prometheus/snmp_exporter.git
cd snmp_exporter/generator
go build
###build时由于会连接国外进行下载go依赖,如果执行无响应可能无法完成下载,需要设置代理
export GOPROXY=https://goproxy.io
export GO111MODULE=on

make mibs
执行完上述步骤后,在当前目录下会出现mibs文件夹,里面是下载好的一些mib文件。如果有些oid树是厂家自定义的,则要求厂家提供mib库文件(注意mib文件中,name不能为中文)并放到mibs目录下,
如自定义mib私有库则需要更改generator.yml
mv generator.yml generator.yml.bak
vim generator.yml 修改后,修改后的文件内容如下:
modules: test: walk: - 1.3.6.1.2.1.25.3.3.1.1 version: 2 auth: community: public |
执行以下操作,生成新的 snmp.yml 文件
export MIBDIRS=mibs
或者export MIBDIRS=/usr/share/snmp/mibs/
./generator generate
将新生成的snmp.yml 替换掉原snmp_exporter中的snmp.yml

cp /opt/snmp_exporter/generator/snmp.yml /usr/local/snmp_exporter/snmp.yml
systemctl restart snmp_exporter
4、服务验证
登录snmp_exporter服务端口9116,module是generator.yml中定义的,点击submit确认返回结果


和snmpwalk -v 2c -c public 172.16.0.112 1.3.6.1.2.1.25.3.3.1.1执行的结果一致
5、其他方式完成snmp.yml文件生成:docker环境上完成snmp_exporter的配置文件
如果想在 docker 中运行生成器以生成snmp.yml配置,请运行以下命令。
Docker 映像需要一个包含您希望使用的 MIBgenerator.yml的目录和一个名为的目录。mibs
此示例将生成snmp.yml包含在 snmp_exporter repo 顶层中的示例:
make mibs
docker build -t snmp-generator .
docker run -ti \
-v "${PWD}:/opt/" \
snmp-generator generate
=====================================================
或者
mkdir generator
cd generator
vi generator.yml
docker run -it -v "${PWD}:/opt/" snmp-generator generate
docker run -it -v "${PWD}:/opt/" prom/snmp-generator:master generate
四、Prometheus服务配置
1、在prometheus 配置文件添加
- job_name: 'snmp' scrape_interval: 10s static_configs: - targets: - 172.16.0.167 # 交换机IP地址 metrics_path: /snmp params: module: [test] #generator.yml或SNMP_EXRORTER配置文件中对应的模块 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 172.16.0.112:9116 # snmp_exporter 服务IP地址 |
2、 reload prometheus配置
curl -X POST 127.0.0.1:9090/-/reload
3、数据查询
http://172.16.1.84:9090/targets


五、Genesys对接Prometheus服务配置
环境准备:
- genesys的license版本需要在11.16版本以上,过低会无法获取数据
- 准备snmp_exporter服务端和Generator工具
步骤一、验证snmp服务是否正常
例如使用MIB工具连接到Genesys,并导入Genesys的MIB库文件GENESYS-SML-MIB-G7,选中gServerTable右键Table View查看是否可以获取的到数据

步骤二、snmp_exporter服务端导入MIB文件
将GENESYS-SML-MIB-G7.txt文件导入到/usr/share/snmp/mibs/目录下,使用命令验证是否可以获取到数据
snmpwalk -v 2c -m all -c public 172.16.5.51 gServerVersion
或者
snmpwalk -v 2c -m GENESYS-SML-MIB-G71 -c public 172.16.5.51 gServerVersion

步骤三、生成generator.yml文件
vim generator.yml
modules: genesys: walk: - servers - genericServer - gsCleanupTimeout - gServersTable - gServersEntry - gServerId - gServerName - gServerStatus - gServerType - gServerVersion - gServerWorkDir - gServerCommandLine - gServerPID - gServerCommand - gServerDeleteClient - gServerControlTable - gServerControlEntry - gsCtrlServerID - gsCtrlTableID - gsCtrlRefreshStatus - gsCtrlLastRefreshed - gsCtrlAutomaticRefresh - gsCtrlRowStatus - gsInfoTable - gsInfoEntry - gsClientsExistNum - gsClientsTotalNum - gsServerConfigFile - gsLogTable - gsLogEntry - gsPollingTable - gsPollingEntry - gsPollingStatus - gsPollingInterval - gsPollingID - gsPollingLastTrap - gsClientTable - gsClientEntry - gsClientSocket - gsClientAppName - gsClientAuthorized - gsClientType - gsClientGotEvents - gsClientSentReqs - specificServer - tServer - tsInfoTable - tsInfoEntry - tsCallsExistNum - tsCallsTotalNum - tsLinksCommand - tsLastChangedLinkStatus - tsCallTable - tsCallEntry - callInstanceID - callConnID - callState - callCallID - callType - callReferenceID - callTimeStamp - callDNIS - callANI - callNumParties - callPartiesList - callCustomerID - callFirstTransferLocation - callFirstTransferDN - callLastTransferLocation - callLastTransferDN - tsDtaTable - tsDtaEntry - tsDtaInstanceID - tsDtaDigits - tsDtaMode - tsDtaState - tsDtaType - tsLinkTable - tsLinkEntry - tsLinkID - tsLinkName - tsLinkStatus - tsLinkProtocol - tsLinkSocket - tsLinkPID - tsLinkDelay - tsLinkPort - tsLinkAddress - tsLinkX25LocalAddress - tsLinkMode - tsLinkX25Device - tsLinkDTEClass - tsLinkTemplate - tsCallFilterTable - tsCallFilterEntry - fltCallCreatedBefore - fltCallCreatedAfter - fltCallUpdatedBefore - fltCallUpdatedAfter - fltClearCallByConnId - tsCallInfoTable - tsCallInfoEntry - callInfoInstanceID - callInfoConnID - callInfoType - callInfoCreationTimestamp - callInfoLastUpdatedTimestamp - callInfoInternalParties - tsLinkStatsTable - tsLinkStatsEntry - linkId - timeElapsedSec - numberMessagesTx - rateMessagesTx - numberMessagesRx - rateMessagesRx - hosts - hostsTable - hostsEntry - hostId - hostName - hostStatus - hostIPAddress - hostOSType - hostLCAPort - solutions - solutionsTable - solutionsEntry - solutionId - solutionName - solutionType - solutionStatus - solutionControlServer - solutionsComponentsTable - solutionsComponentsEntry - componentId - componentName - notifications - gsAlarm - gsMLAlarm - gsServerUpTrap - gsServerDownTrap - gsPollingSignal - tsLinkStatusTrap - genesysmibConformance - genesysmibGroups - gServersListGroup - genericServerControlGroup - genericServerInfoGroup - genericServerLogGroup - genericServerPollingGroup - genericServerClientGroup - tServerInfoGroup - tServerCallGroup - tServerDtaGroup - tServerLinkGroup - genericAlarmObjectGroup - specificAlarmObjectGroup - hostsGroup - solutionsGroup - solutionsComponentsGroup - notificationGroup - tServerCallFilterGroup - tServerCallInfoGroup - genesysmibCompliances - genesysmibCompliance version: 2 auth: community: public |
生成snmp.yml文件
./generator generate
替换snmp_exporter默认文件
cp /opt/snmp_exporter/generator/snmp.yml /usr/local/snmp_exporter/snmp.yml
systemctl restart snmp_exporter

看到数据表示服务正常获取数据

步骤四、Prometheus服务配置
参考上文
- job_name: 'snmp' scrape_interval: 10s static_configs: - targets: - 172.16.5.51 # genesys地址 metrics_path: /snmp params: module: [genesys] #generator.yml或SNMP_EXRORTER配置文件中对应的模块 relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 172.16.0.112:9116 # snmp_exporter 服务IP地址 |



如果要实现告警貌似有难度因为获取到的数据不是key vlaue形式,告警只能通过sql语句进行匹配查询自定义告警规则,有些难度
====================================================================================
genesys:普罗米修斯告警监控指标
gServerStatus指标如下:目前只告警1、2、3、6状态
1: statusUnknown
2: statusStopped
3: statusPending
4: statusRunning
5: statusInitializing
6: statusServiceUnavailable
7: statusSuspending
8: statusSuspended
告警条件指标
0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==1
0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==2
0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==3
0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==6
如果要排除告警的服务如下配置:
假设把:gServerId="106"进行排除
{gServerId="106", gServerName="LOGDAP", instance="172.16.5.51", job="genesys_snmp"} 1
{gServerId="107", gServerName="ERSDAP", instance="172.16.5.51", job="genesys_snmp"} 1
{gServerId="135", gServerName="CFG_ICONDAP", instance="172.16.5.51", job="genesys_snmp"} 1
============================================================================================
0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus{gServerId!~"(106|107|135)"} ==1
手动告警规则:
vim alert_genesys.rules
groups:
- name: genesys-alert-rule
rules:
- alert: gsAlarm statusUnknown
expr: 0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==1
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.gServerName }} statusUnknown"
description: "{{ $labels.gServerName }} PID为{{ $labels.gServerId}}的服务异常,请及时检查!!!"
- alert: gsAlarm statusStopped
expr: 0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==2
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.gServerName }} statusStopped"
description: "{{ $labels.gServerName }} PID为{{ $labels.gServerId}}的服务异常,请及时检查!!!"
vim alert_genesys.rules
- alert: gsAlarm statusPending
expr: 0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==3
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.gServerName }} statusPending"
description: "{{ $labels.gServerName }} PID为{{ $labels.gServerId}}的服务异常,请及时检查!!!"
- alert: gsAlarm statusServiceUnavailable
expr: 0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus ==6
for: 10s
labels:
severity: critical
annotations:
summary: "{{ $labels.gServerName }} statusServiceUnavailable"
description: "{{ $labels.gServerName }} PID为{{ $labels.gServerId}}的服务异常,请及时检查!!!"


=================================================================================
使用Trap进行告警
gsServersLastTrap{gsServersLastTrap=~'.*DOWN'}==1
gsServersLastAlarm{gsServersLastAlarm=~'.*'}==1
vim alert_genesys.rules
groups:
- name: genesys-alert-rule
rules:
- alert: gsServersLastTrap
expr: gsServersLastTrap{gsServersLastTrap=~'.*DOWN'}==1
for: 10s
labels:
severity: critical
annotations:
summary: "gsServersLastTrap 通知"
description: "{{ $labels.gsServersLastTrap }}"
- alert: gsServersLastAlarm
expr: gsServersLastAlarm{gsServersLastAlarm=~'.*'}==1
for: 10s
labels:
severity: critical
annotations:
summary: "gsServersLastAlarm 通知"
description: "{{ $labels.gsServersLastAlarm }}"


alertmanager邮件的配置如下:
vim prometheus.yml
global:
scrape_interval: 60s
evaluation_interval: 60s
scrape_timeout: 30s
rule_files:
- /etc/prometheus/*.rules
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['172.16.1.84:9093']
vim alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_require_tls: false
smtp_smarthost: 'smtp.163.com:25' # smtp地址
smtp_from: '15188257614@163.com' # 发送邮箱地址
smtp_auth_username: '15188257614@163.com' # 邮箱用户
smtp_auth_password: 'JPLGFRGKFPVJIXGU' # 邮箱密码
templates:
- '/etc/alertmanager/email.tmpl'
route:
group_by: ["alertname"] # 分组名
group_wait: 60s # 当收到告警的时候,等待十秒看是否还有告警,如果有就一起发出去
group_interval: 5m
repeat_interval: 1h # 重复报警的间隔时间
receiver: 'email' # 全局报警组,这个参数是必选的,和下面报警组名要相同
receivers:
- name: 'email' # 报警组名
email_configs:
#- to: 'doufadong@wilcom.com.cn,yf-tangjun@wilcom.com.cn,luchengjun@wilcom.com.cn' # 收件人邮箱
- to: 'doufadong@wilcom.com.cn' # 收件人邮箱
html: '{{ template "email.to.html" . }}'
send_resolved: true
headers: {Subject: "告警邮件通知"}
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
[root@harbor alertmanager]# cat email.tmpl
{{ define "email.to.html" }}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Local }} <br>
=========end==========<br>
{{ end }}
{{ end }}
========================================================
和grafana告警整合
- 修改grafana smtp配置后重启服务
vi conf/defaults.ini #################################### SMTP / Emailing ##################### [smtp] enabled = true host = smtp.163.com:25 user = 15188257614@163.com # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;""" password = JPLGFRGKFPVJIXGU cert_file = key_file = skip_verify = false from_address = 15188257614@163.com from_name = Grafana ehlo_identity = startTLS_policy = [emails] welcome_email_on_sign_up = false templates_pattern = emails/*.html |
配置告警媒介为邮箱,填写接收邮件地址

- 创建genesys的图表和告警规则






以上只能对gservertable表中获取数据,其他表无法获取数据是由于没有和tserver做绑定设置,设置如下




.116.1-11(116是tserver的索引,1-11是不同的服务,状态需要设置为4)完成设置后相关数据都能获取到了

使用脚本自动配置
vim gs_check_snmp.sh #!/bin/bash gs_snmp_ip=172.16.5.51 gs_server_id=`snmpwalk -v 2c -m GENESYS-SML-MIB-G71 -c public $gs_snmp_ip |grep SIPServer |grep "is UP" |awk -F . '{print$15}' |tail -n1` gs_run_count=`snmpwalk -v 2c -m GENESYS-SML-MIB-G71 -c public $gs_snmp_ip |grep SIPServer |grep "is UP" |wc -l` gs_conf_count=`snmpwalk -v 2c -m GENESYS-SML-MIB-G71 -c public $gs_snmp_ip gsCtrlRowStatus |grep -v "gsCtrlRowStatus = No Such Object available on this agent at this OID"|wc -l` ###判断gs sipserver是否正常运行 if [ "$gs_run_count" -eq "0" ];then echo -e "\033[32m sipserver 服务未运行 \033[0m" else if [ "$gs_conf_count" -eq "0" ];then echo -e "\033[32m gs_snmp 配置中 \033[0m" for i in {1,2,4,5,6,7,8,9,10};do snmpset -c private -v 2c $gs_snmp_ip 1.3.6.1.4.1.1729.100.1.3.1.6.$gs_server_id.$i i 4;done echo -e "\033[32m gs_snmp 配置成功 \033[0m" else echo -e "\033[32m gs_snmp 无需配置 \033[0m" exit 1 fi fi #删除操作 #snmpset -c private -v 2c 172.16.5.51 1.3.6.1.4.1.1729.100.1.3.1.6.116.8 i 6 ######状态栏列对象有6个定义值: #active(1),表明状态行是可用的 #notInService(2),表明行存在但不可用 #notReady(3),表明存在,但因为缺少必要的信息而不能用 #createAndGo(4),有管理者设置,表明希望创建一个概念行并设置该行的状态列对象为active #createAndWait(5),有管理者设置,表明希望创建一个概念行,但不可用 #destroy(6),删除行 |
管理的对象:https://docs.genesys.com/Documentation/FR/latest/MLUG/SNMPobjs

六、Pushgateway安装使用
tar -zxvf pushgateway-1.0.0.linux-amd64.tar.gz
mv pushgateway-1.0.0.linux-amd64 /usr/local/pushgateway
添加服务文件
cat > /usr/lib/systemd/system/pushgateway.service <<EOF [Unit] Description=pushgateway Documentation=https://github.com/prometheus/pushgateway After=network.target
[Service] ExecStart=/usr/local/pushgateway/pushgateway Restart=on-failure
[Install] WantedBy=multi-user.target EOF |
systemctl start pushgateway
systemctl start pushgateway
服务验证

推送一条测试数据
echo "some_metric 3.14" | curl --data-binary @- http://172.16.0.167:9091/metrics/job/some_job

七、SNMPTrapServer使用—弃用

git clone https://toscode.gitee.com/sunzheng86/snmptrap-server.git
cd snmptrap-server
yum -y install go
export GOPROXY=https://goproxy.io
export GO111MODULE=on
go get
go build -o snmptrap-server main.go

mv miblist.txt miblist.txt.bak
snmptranslate -Tz -m /usr/share/snmp/mibs/GENESYS-SML-MIB-G7.txt >miblist.txt
vim config.yml
LOGCONF: level: info format: text director: logs link-name: current.log TrapServer: ip: "172.16.5.51" port: 161 version: "v2c" #v1, v2c, v3 community: "public" timeout: 2 #seconds maxoids: 60 mib_map_file: "miblist.txt" ApiServer: api_port: 8070 # apiserver的监听端口 api_read_timeout: 5 #apiserver读请求超时时间, 秒 api_write_timeout: 5 # apiserver写请求超时时间,秒 api_web_root: "./webapp" sender: pushgateway_url: "http://172.16.0.167:9091" job_name: "snmp_trap" # zabbix_host: "10.187.32.8" # zabbix_port: 10051 # zbx_itme_key: "snmptraper.fallback" # sender_dir: "./zabbix_sender.exe" senders: # - "zabbix" - "pushgateway" |
服务运行
./snmptrap-server -c config.yml

发送测试数据
snmptrap -v 2c -c public 172.16.0.167 '1234567' 1.3 sysLocation.0 s "test"


目前有个问题,不能获取:snmp_trap_pdu{host=“发送过来的主机IP” OID=“.x.x.x.x." value="xxxx" type="xxxxx" ts=“接收到消息的时间”} ,暂时放弃
八、奥科SBC SNMP说明
注意事项:根据简易文档说明只需要几个mib文件就可以获取指标,实际测试发现只加载几个无法获取到数据,需要将整个Mibs7.4.20220524的mib文件全部覆盖替换到ManageEngine MibBrowser客户端工具默认的mib目录下,注意ManageEngine MibBrowser工具识别mib文件后缀是空的,如果使用centos 7 snmp工具获取数据指标,同样的修改后缀后在全部替换到/usr/share/snmp/mibs并重启snmp服务

如下面截图就是识别失败需要全部替换mibs文件

九、prometheus的网络ping监控exporter
项目地址:https://github.com/2767321434/ping_exporter
Prometheus exporter for test network by ping 测试网络情况的exporter,使用ping,可以监控特定http接口,或者百度等,监控网络波动情况
克隆项目:git clone https://github.com/2767321434/ping_exporter
启动方法
./main -port 8889 -pingaddr www.baidu.com -count 4
启动后访问 127.0.0.1:8889/metrics
可以查看到输出指标,有无法访问次数和平均延迟统计信息
指标数据:
# TYPE ping_avg_time gauge平均延迟
ping_avg_time 0.22
# TYPE ping_lost gauge丢包率
ping_lost 0

十、prometheus使用pushgateway监控网路丢包
监控网路丢包脚本
[root@gtcq-gt-monitor-prometheus-01 ~]# timeout 50 ping -q -A -s 500 -W 1000 -c 1000 10.1.32.95|grep transmitted|awk '{print $6}'
[root@gtcq-gt-monitor-prometheus-01 shell_script]# more icmp_gpu_monitor.sh
#!/bin/bash
#
#####################################
#@brief 功能:监控网路丢包率和延迟 -s 是一个ping包的大小 -W 是延迟timeout -c 是发生多少数据包
#@author xiajing
#@version 1.0
#@date 2021/01/13
#@log no
#####################################
#shell Env
#ping发包数
c_times=200
#IP列表数组
ip_arr=( 10.1.33.188 )
for (( i = 0; i < ${#ip_arr[@]}; ++i ))
do
result=`timeout 16 ping -q -A -s 200 -W 250 -c $c_times ${ip_arr[i]}|grep transmitted|awk '{print $6,$10}'`
if [ -z "$result" ]
then
value_lostpk=101
value_rrt=1000
echo "ykt_lostpk_gt_jd ${value_lostpk}" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/ykt_icmp/instance/${ip_arr[i]}
echo "ykt_rrt_gt_jd ${value_rrt}" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/ykt_icmp/instance/${ip_arr[i]}
else
lostpk=$(echo $result|awk '{print $1}')
rrt=$(echo $result|awk '{print $2}')
value_lostpk=$(echo $lostpk | sed 's/%//g')
value_rrt=$(echo $rrt |sed 's/ms//g')
#value_rrt=$(($value_rrt/$c_times))
value_rrt=$(printf "%.5f" `echo "scale=5;$value_rrt/$c_times"|bc`)
echo "ykt_lostpk_gt_jd ${value_lostpk}" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/ykt_icmp/instance/${ip_arr[i]}
echo "ykt_rrt_gt_jd ${value_rrt}" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/ykt_icmp/instance/${ip_arr[i]}
fi
echo ${ip_arr[i]}"==="$value_lostpk"==="$value_rrt
done
[root@gtcq-gt-monitor-prometheus-01 shell_script]#


十一、Prometheus通过Process-exporter实现任务进程监控
1.下载process-exporter
2.安装部署process-exporter
在要监控的机器上,安装proces-exporter。
tar -xvf process-exporter-0.7.10.linux-amd64.tar.gz
mv process-exporter-0.7.10.linux-amd64 /usr/local/process-exporter
3.编写配置文件
vi process-name.yaml
process_names: - name: "{{.Matches}}" cmdline: - 'ces-wd.jar' - name: "{{.Matches}}" cmdline: - 'freeswitch' |
4.服务启动
vim /usr/lib/systemd/system/process-exporter.service
[Unit] Description=process_exporter After=network.target [Service] User=root Type=simple ExecStart=/usr/local/process-exporter/process-exporter -config.path /usr/local/process-exporter/process-name.yaml Restart=on-failure [Install] WantedBy=multi-user.target |
systemctl daemon-reload
systemctl start process-exporter.service
systemctl status process-exporter.service
systemctl enable process-exporter.service #开机自启
curl localhost:9256/metrics #服务验证
5配置Prometheus
修改Prometheus配置文件
- job_name: 'process' static_configs: -targets: ['172.**.**.**:9256'] |
多个进程配置示例
….. - job_name: 'process' scrape_interval: 2m scrape_timeout: 120s static_configs: file_sd_configs: - files: - /data/prometheus/config/process_exporter.json ….. [root@harbor config]# cat process_exporter.json [ { "labels": { "desc": "ces-wd", "group": "ces-wd.jar", "host_ip": "172.16.1.254", "hostname": "abc_baidu" }, "targets": [ "172.16.1.254:9256" ] }, { "labels": { "desc": "freeswitch", "group": "freeswitch", "host_ip": "172.16.1.254", "hostname": "abc_baidu" }, "targets": [ "172.16.1.254:9256" ] } ] |
配置告警规则:
groups: - name: dol_alert_process_rule rules: - alert: Dolphinscheduler Alert ProcessDown # 告警名称 expr: (namedprocess_namegroup_num_procs{groupname="map[:alertlib]"}) == 0 for: 1m # 满足告警条件持续时间多久后,才会发送告警 labels: #标签项 severity: error annotations: # 解析项,详细解释告警信息 summary: "dolphinscheduler Alert {{ $labels.instance }} has been down for more than 1 minutes" description: "dolphinscheduler Alert has been down, This requires immediate action!" |
热加载Prometheus报警规则
./promtool check config prometheus.yml systemctl reload prometheus.service |
监控的指标

十二、pgsql安装与监控
yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm yum install postgresql12-server postgresql12 初始化数据库,初始化数据库命令会在 /var/lib/pgsql 目录下创建名称为12文件夹,11为数据库版本,如果安装的是其他版本,对应的是其版本号(9.4、9.5) /usr/pgsql-12/bin/postgresql-12-setup initdb 设置开机启动并启动 systemctl enable postgresql-12 && systemctl start postgresql-12 安装完成后,默认会创建一个名为postgres的linux登录用户 默认postgresql不允许远程连接,修改配置,使远程用户可以连接 vim /var/lib/pgsql/12/data/postgresql.conf listen_addresses = '*' vim /var/lib/pgsql/12/data/pg_hba.conf # IPv4 local connections: host all all 0.0.0.0/0 md5 重启生效 systemctl restart postgresql-12 修改数据库用户密码 使用postgres用户登录,并设置postgres用户密码 su – postgres postgres=# ALTER USER postgres WITH PASSWORD 'pgsql123'; psqlpostgres=# select usename,passwd from pg_shadow; postgres=# SELECT datname FROM pg_database; 使用\q可以退出终端 访问测试 bash-4.2$ psql -d postgres -U postgres -h 172.16.1.111 Password for user postgres: psql (9.2.24, server 12.11) WARNING: psql version 9.2, server version 12.0. Some psql features might not work. Type "help" for help. postgres=#
|
监控组件部署
wget https://github.com/wrouesnel/postgres_exporter/releases/download/v0.5.1/postgres_exporter_v0.5.1_linux-amd64.tar.gz mkdir /usr/local/postgres_exporter && tar -zxvf postgres_exporter_v0.5.1_linux-amd64.tar.gz -C /usr/local/postgres_exporter cd /usr/local/postgres_exporter && mv postgres_exporter_v0.5.1_linux-amd64/* ./ && rm -fr postgres_exporter_v0.5.1_linux-amd64/ vim postgres_exporter.env DATA_SOURCE_NAME="postgresql://postgres:pgsql123@172.16.1.111:5432/?sslmode=disable" vim /etc/systemd/system/postgres_exporter.service [Unit] Description=Prometheus exporter for Postgresql Wants=network-online.target After=network-online.target [Service] User=postgres Group=postgres WorkingDirectory=/usr/local/postgres_exporter EnvironmentFile=/usr/local/postgres_exporter/postgres_exporter.env ExecStart=/usr/local/postgres_exporter/postgres_exporter Restart=always [Install] WantedBy=multi-user.target 服务启动 systemctl start postgres_exporter.service systemctl enable postgres_exporter.service systemctl status postgres_exporter.service 访问测试 curl http://172.16.1.111:9187/metrics 普罗米修斯配置 - job_name: 'postgres' static_configs: - targets: ['172.16.1.111:9187'] labels: instance: postgresdb1 # 如果有其他postgres_exporter如下添加: # - targets: ['10.10.xxx.56:9287'] # labels: # instance: db2 重启后加载curl -X POST 172.16.1.84:9090/-/reload grafanaid: 9628 455 |


普罗米修斯告警规则
警报规则 vi /data/prometheus/conf/rules/postgres.rules groups: - name: postgresql-监控告警 rules: - alert: 警报!Postgresql宕机 expr: pg_up == 0 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql down" description: "Postgresql instance is down\n 当前值={{ $value }}" - alert: 警报!Postgresql被重启 expr: time() - pg_postmaster_start_time_seconds < 60 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql restarted" description: "Postgresql restarted\n 当前值={{ $value }}" - alert: 警报!PostgresqlExporterError expr: pg_exporter_last_scrape_error > 0 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql exporter error" description: "Postgresql exporter is showing errors. A query may be buggy in query.yaml\n 当前值={{ $value }}" - alert: 警报!Postgresql主从复制不同步 expr: pg_replication_lag > 30 and ON(instance) pg_replication_is_replica == 1 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql replication lag" description: "PostgreSQL replication lag is going up (> 30s)\n 当前值={{ $value }}" - alert: 警报!PostgresqlTableNotVaccumed expr: time() - pg_stat_user_tables_last_autovacuum > 60 * 60 * 24 for: 0m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql table not vaccumed" description: "Table has not been vaccum for 24 hours\n 当前值={{ $value }}" - alert: 警报!PostgresqlTableNotAnalyzed expr: time() - pg_stat_user_tables_last_autoanalyze > 60 * 60 * 24 for: 0m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql table not analyzed" description: "Table has not been analyzed for 24 hours\n 当前值={{ $value }}" - alert: 警报!Postgresql连接数太多 expr: sum by (datname) (pg_stat_activity_count{datname!~"template.*|postgres"}) > pg_settings_max_connections * 0.8 for: 2m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql too many connections" description: "PostgreSQL instance has too many connections (> 80%).\n 当前值={{ $value }}" - alert: 警报!Postgresql连接数太少 expr: sum by (datname) (pg_stat_activity_count{datname!~"template.*|postgres"}) < 5 for: 2m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql not enough connections" description: "PostgreSQL instance should have more connections (> 5)\n 当前值={{ $value }}" - alert: 警报!Postgresql死锁 expr: increase(pg_stat_database_deadlocks{datname!~"template.*|postgres"}[1m]) > 5 for: 0m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql dead locks" description: "PostgreSQL has dead-locks\n 当前值={{ $value }}" - alert: 警报!Postgresql慢查询 expr: pg_slow_queries > 0 for: 2m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql slow queries" description: "PostgreSQL executes slow queries\n 当前值={{ $value }}" - alert: 警报!Postgresql回滚率高 expr: rate(pg_stat_database_xact_rollback{datname!~"template.*"}[3m]) / rate(pg_stat_database_xact_commit{datname!~"template.*"}[3m]) > 0.02 for: 0m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql high rollback rate" description: "Ratio of transactions being aborted compared to committed is > 2 %\n 当前值={{ $value }}" - alert: 警报!Postgresql提交率低 expr: rate(pg_stat_database_xact_commit[1m]) < 10 for: 2m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql commit rate low" description: "Postgres seems to be processing very few transactions\n 当前值={{ $value }}" - alert: 警报!PostgresqlLowXidConsumption expr: rate(pg_txid_current[1m]) < 5 for: 2m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql low XID consumption" description: "Postgresql seems to be consuming transaction IDs very slowly\n 当前值={{ $value }}" - alert: 警报!PostgresqllowXlogConsumption expr: rate(pg_xlog_position_bytes[1m]) < 100 for: 2m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresqllow XLOG consumption" description: "Postgres seems to be consuming XLOG very slowly\n 当前值={{ $value }}" - alert: 警报!PostgresqlWaleReplicationStopped expr: rate(pg_xlog_position_bytes[1m]) == 0 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql WALE replication stopped" description: "WAL-E replication seems to be stopped\n 当前值={{ $value }}" - alert: 警报!PostgresqlHighRateStatementTimeout expr: rate(postgresql_errors_total{type="statement_timeout"}[1m]) > 3 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql high rate statement timeout" description: "Postgres transactions showing high rate of statement timeouts\n 当前值={{ $value }}" - alert: 警报!PostgresqlHighRateDeadlock expr: increase(postgresql_errors_total{type="deadlock_detected"}[1m]) > 1 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql high rate deadlock" description: "Postgres detected deadlocks\n 当前值={{ $value }}" - alert: 警报!PostgresqlReplicationLagBytes expr: (pg_xlog_position_bytes and pg_replication_is_replica == 0) - on(environment) group_right(instance) (pg_xlog_position_bytes and pg_replication_is_replica == 1) > 1e+09 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql replication lag bytes" description: "Postgres Replication lag (in bytes) is high\n 当前值={{ $value }}" - alert: 警报!PostgresqlUnusedReplicationSlot expr: pg_replication_slots_active == 0 for: 1m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql unused replication slot" description: "Unused Replication Slots\n 当前值={{ $value }}" - alert: 警报!PostgresqlTooManyDeadTuples expr: ((pg_stat_user_tables_n_dead_tup > 10000) / (pg_stat_user_tables_n_live_tup + pg_stat_user_tables_n_dead_tup)) >= 0.1 unless ON(instance) (pg_replication_is_replica == 1) for: 2m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql too many dead tuples" description: "PostgreSQL dead tuples is too large\n 当前值={{ $value }}" - alert: 警报!PostgresqlSplitBrain expr: count(pg_replication_is_replica == 0) != 1 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql split brain" description: "Split Brain, too many primary Postgresql databases in read-write mode\n 当前值={{ $value }}" - alert: 警报!PostgresqlPromotedNode expr: pg_replication_is_replica and changes(pg_replication_is_replica[1m]) > 0 for: 0m labels: severity: 一般告警 annotations: summary: "{{$labels.instance}} Postgresql promoted node" description: "Postgresql standby server has been promoted as primary node\n 当前值={{ $value }}" - alert: 警报!PostgresqlSslCompressionActive expr: sum(pg_stat_ssl_compression) > 0 for: 0m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql SSL compression active" description: "Database connections with SSL compression enabled. This may add significant jitter in replication delay. Replicas should turn off SSL compression via `sslcompression=0` in `recovery.conf`.\n 当前值={{ $value }}" - alert: 警报!PostgresqlTooManyLocksAcquired expr: ((sum (pg_locks_count)) / (pg_settings_max_locks_per_transaction * pg_settings_max_connections)) > 0.20 for: 2m labels: severity: 严重告警 annotations: summary: "{{$labels.instance}} Postgresql too many locks acquired" description: "Too many locks acquired on the database. If this alert happens frequently, we may need to increase the postgres setting max_locks_per_transaction.\n 当前值={{ $value }}" |
十三、Asterisk监控
前提加载相关模块,使查询的命令可以生效
asterisk –rvv执行命令进入cli sip show peers 显示所有已定义的SIP peer sip show peers No such command 'sip show peers' (type 'core show help sip show' for other possible commands) 解决方法 module load chan_sip.so module reload chan_sip.so sip show peers sip show channels 显示所有活动的SIP通道 |
方式一
git clone https://github.com/robinmarechal/asterisk_exporter.git cd asterisk_exporter vim main.go enableAgentsCollector = kingpin.Flag("collector.agents", "Enable agents collector").Default("false").Bool() 将默认的main.go文件进行修改,将agents改为关闭,Asterisk16版本中没有这个查询指标,如果启动会导致采集服务失败 make ./asterisk_exporter <flags> ./asterisk_exporter –h 注意事项,make时需要go语言支持,且需要执行下面配置,否则提示/bin/promu无改命令 touch /root/go/bin/promu ln -s /root/go/bin/promu /bin/promu rm -f /root/go/bin/promu |

服务启动
./asterisk_exporter --collector.bridges --collector.calendars --collector.confbridges --collector.modules
配置系统服务
vim asterisk_exporter.service [Unit] Description=Asterisk call center system exporter for Prometheus After=network.target [Service] User=asterisk Group=asterisk Type=simple ExecStart=/usr/local/asterisk_exporter/asterisk_exporter \ --web.listen-address=":9815" \ --web.telemetry-path="/metrics" \ --metrics.prefix="asterisk" \ --asterisk.path="/usr/sbin/asterisk" \ --collector.core \ --collector.sip \ --collector.bridges \ --collector.calendars \ --collector.confbridges \ --collector.modules Restart=always RestartSec=1 [Install] WantedBy=multi-user.target 服务启动 cp asterisk_exporter.service /usr/lib/systemd/system/asterisk_exporter.service systemctl enable asterisk_exporter systemctl start asterisk_exporter systemctl status asterisk_exporter 以上如果启动不了可以尝试修改下面参数 ExecStart=/usr/local/asterisk_exporter/asterisk_exporter --asterisk.path=/usr/sbin/asterisk --collector.core --collector.sip --collector.bridges --collector.calendars --collector.confbridges --collector.modules |

方式二
Git clone https://github.com/tainguyenbp/asterisk_exporter.git

十四、JVM监控
一、概述
JMX Exporter
https://github.com/prometheus/jmx_exporter
它是Prometheus官方组件,作为一个JAVA Agent来提供本地JVM的metrics,并通过http暴露出来。这也是官方推荐的一种方式,可以获取进程的信息,比如CPU和内存使用情况。
Jmx_exporter是以代理的形式收集目标应用的jmx指标,这样做的好处在于无需对目标应用做任何的改动。
运行JMX exporter的方式:
java XXX -javaagent:/root/jmx_exporter/jmx_prometheus_javaagent-0.12.0.jar=3010:/root/jmx_exporter/config.yaml -jar XXX.jar
安装部署
cd /usr/local mkdir jvm_exporter vim simple-config.yml --- wercaseOutputLabelNames: true lowercaseOutputName: true whitelistObjectNames: ["java.lang:type=OperatingSystem"] rules: - pattern: 'java.lang<type=OperatingSystem><>((?!process_cpu_time)\w+):' name: os_$1 type: GAUGE attrNameSnakeCase: true 或者使用下面配置 -- 比如我有一个rms的java应用,启动方式为: java -jar /data/rms/RMS.jar 使用JMX Exporter插件收集数据,需要改成这样: java -javaagent:/usr/local/jmx_exporter/jmx_prometheus_javaagent-0.12.0.jar=3010:/usr/local/jmx_exporter/simple-config.yml -jar /data/rms/RMS.jar 注意:3010是代理端口,可以随意指定。 或者下面已遵义公积金ivr流程为例 java -javaagent:/usr/local/jvm_exporter/jmx_prometheus_javaagent-0.12.0.jar=3310:/usr/local/jvm_exporter/simple-config.yml -jar ./laihu-ivr-zygjj.jar
|
可以正常获取JVM指标

对接普罗米修斯系统
新增job
- job_name: 'jvm_exporter'
static_configs:
- targets: ['172.16.1.254:3310']
labels:
instance: jvm_exporter
对接grafana系统
https://grafana.com/grafana/dashboards/8563
选择8563模板
附录1:Generator的yml配置模块列表
modules: module_name: # The module name. You can have as many modules as you want. walk: # List of OIDs to walk. Can also be SNMP object names or specific instances. - 1.3.6.1.2.1.2 # Same as "interfaces" - sysUpTime # Same as "1.3.6.1.2.1.1.3" - 1.3.6.1.2.1.31.1.1.1.6.40 # Instance of "ifHCInOctets" with index "40" version: 2 # SNMP version to use. Defaults to 2. # 1 will use GETNEXT, 2 and 3 use GETBULK. max_repetitions: 25 # How many objects to request with GET/GETBULK, defaults to 25. # May need to be reduced for buggy devices. retries: 3 # How many times to retry a failed request, defaults to 3. timeout: 5s # Timeout for each individual SNMP request, defaults to 5s. auth: # Community string is used with SNMP v1 and v2. Defaults to "public". community: public # v3 has different and more complex settings. # Which are required depends on the security_level. # The equivalent options on NetSNMP commands like snmpbulkwalk # and snmpget are also listed. See snmpcmd(1). username: user # Required, no default. -u option to NetSNMP. security_level: noAuthNoPriv # Defaults to noAuthNoPriv. -l option to NetSNMP. # Can be noAuthNoPriv, authNoPriv or authPriv. password: pass # Has no default. Also known as authKey, -A option to NetSNMP. # Required if security_level is authNoPriv or authPriv. auth_protocol: MD5 # MD5, SHA, SHA224, SHA256, SHA384, or SHA512. Defaults to MD5. -a option to NetSNMP. # Used if security_level is authNoPriv or authPriv. priv_protocol: DES # DES, AES, AES192, or AES256. Defaults to DES. -x option to NetSNMP. # Used if security_level is authPriv. priv_password: otherPass # Has no default. Also known as privKey, -X option to NetSNMP. # Required if security_level is authPriv. context_name: context # Has no default. -n option to NetSNMP. # Required if context is configured on the device. lookups: # Optional list of lookups to perform. # The default for `keep_source_indexes` is false. Indexes must be unique for this option to be used. # If the index of a table is bsnDot11EssIndex, usually that'd be the label # on the resulting metrics from that table. Instead, use the index to # lookup the bsnDot11EssSsid table entry and create a bsnDot11EssSsid label # with that value. - source_indexes: [bsnDot11EssIndex] lookup: bsnDot11EssSsid drop_source_indexes: false # If true, delete source index labels for this lookup. # This avoids label clutter when the new index is unique. overrides: # Allows for per-module overrides of bits of MIBs metricName: ignore: true # Drops the metric from the output. regex_extracts: Temp: # A new metric will be created appending this to the metricName to become metricNameTemp. - regex: '(.*)' # Regex to extract a value from the returned SNMP walks's value. value: '$1' # The result will be parsed as a float64, defaults to $1. Status: - regex: '.*Example' value: '1' # The first entry whose regex matches and whose value parses wins. - regex: '.*' value: '0' type: DisplayString # Override the metric type, possible types are: # gauge: An integer with type gauge. # counter: An integer with type counter. # OctetString: A bit string, rendered as 0xff34. # DateAndTime: An RFC 2579 DateAndTime byte sequence. If the device has no time zone data, UTC is used. # DisplayString: An ASCII or UTF-8 string. # PhysAddress48: A 48 bit MAC address, rendered as 00:01:02:03:04:ff. # Float: A 32 bit floating-point value with type gauge. # Double: A 64 bit floating-point value with type gauge. # InetAddressIPv4: An IPv4 address, rendered as 1.2.3.4. # InetAddressIPv6: An IPv6 address, rendered as 0102:0304:0506:0708:090A:0B0C:0D0E:0F10. # InetAddress: An InetAddress per RFC 4001. Must be preceded by an InetAddressType. # InetAddressMissingSize: An InetAddress that violates section 4.1 of RFC 4001 by # not having the size in the index. Must be preceded by an InetAddressType. # EnumAsInfo: An enum for which a single timeseries is created. Good for constant values. # EnumAsStateSet: An enum with a time series per state. Good for variable low-cardinality enums. # Bits: An RFC 2578 BITS construct, which produces a StateSet with a time series per bit. |
附录2:Centos7配置SNMPtrapd
vi /etc/snmp/snmptrapd.conf authCommunity log,execute,net public # systemctl start snmptrapd # systemctl enable snmptrapd 或者 snmptrapd -C -c /etc/snmp/snmptrapd.conf -Lf /var/log/net-snmptrap.log netstat -nlp|grep 162 pkill -9 snmptrapd 现在您已经设置好了,让我们抛出一个 SNMP 陷阱来测试脚本是否运行。 当指定 linkDown 的 OID 时就是这种情况。 snmptrap -v 2c -c public 127.0.0.1 '' .1.3.6.1.6.3.1.1.5.3 然后我们再去我们的日志查看 cat/var/log/snmptrap/snmptrap.log
|
使用snmpb工具测试
先登录snmp服务端

snmptrap -v 2c -c public 172.16.3.28 '1234567' .1.3.6.1.6.3.1.1.5.3 sysLocation.0 s "test"
172.16.3.28为客户端的IP地址

附录3:SNMPB工具测试添加mib库
在C:\Program Files (x86)\SnmpB\目录新建一个项目目录如:genesys,并将mib文件放入

重新启动snmpb工具并创建modules目录参数,再选中加载的modules


附录4:Prometheus查询
gServerStatus {job="genesys_snmp"} !=4
gServerName{gServerId="130",gServerName="MCP_2"}
{__name__=~"gServerName|gServerStatus"}
{__name__=~"gServerName|gServerStatus",gServerId=~"130"}
gsServersLastTrap{job="genesys_snmp"} == 1
gsServersLastTrap{gsServersLastTrap=~"[a-zA-Z_][a-zA-Z0-9_]} == 1
gsServersLastTrap{job=~"[a-zA-Z_][a-zA-Z0-9_]*"} == 1
gsServersLastTrap{gsServersLastTrap=~"[a-zA-Z_]_[a-zA-Z0-9_]:*"} == 1
gsServersLastTrap{gsServersLastTrap==~"[a-zA-Z_][a-zA-Z0-9_]* * *"} == 1
gsServersLastTrap{gsServersLastTrap="MCP_2:server is UP", instance="172.16.5.51", job="genesys_snmp"}
gsServersLastTrap{gsServersLastTrap="confserv:server is DOWN", instance="172.16.5.51", job="genesys_snmp"}
sum ({__name__=~"gServerName|gServerStatus",gServerId=~"130"} )-1
显示instance
sum ({__name__=~"gServerName|gServerStatus",gServerId=~"130"} ) without (instance)
sum ({__name__=~"gServerName|gServerStatus",gServerId=~"130"} ) without (instance) <4
sum by (gServerName) ({__name__=~"gServerName|gServerStatus",gServerId=~"130"} ) !=4 >1
gServerStatus{gServerId=~'.*'} <4
{__name__=~"gServerName|gServerStatus",gServerId=~".*"}
查询只有106和107
gServerStatus{gServerId=~'.*',gServerId=~"(106|107)"} <4
===============
排除106/107/118写法
gServerStatus{gServerId=~'.*',gServerId!~"(106|107|118)"} <4
============
联表查询
格式:
0*gServerName + on(tempIndex) group_left(gServerStatus) gServerStatus
============
0*gServerName+ on(gServerId,instance,job) group_left(gServerStatus) gServerStatus
0*gServerStatus+ on(gServerId,instance,job) group_left(gServerName) gServerName
============
1-sum(gServerName{gServerId=~"130"}) + sum({__name__=~"gServerName|gServerStatus",gServerId=~"130"}) - 1 != 4
附录5、奔驰项目SIP中继监控

prometheus配置: 说明:14,18,19为三个不同的中继接口。当使用率大于80%后报警。 报警公式: pstn_channel_use / (pstn_channel_idle + pstn_channel_use) > 0.8 ###################### - job_name: snmp honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /snmp scheme: http static_configs: - targets: - "14" - "18" - "19" relabel_configs: - source_labels: [__address__] separator: ; regex: (.*) target_label: __param_target replacement: $1 action: replace - source_labels: [__param_target] separator: ; regex: (.*) target_label: instance replacement: $1 action: replace - separator: ; regex: (.*) target_label: __address__ replacement: 192.168.185.85:9116 action: replace ################################## |




