Nagios搭建监控服务器 ####################################
#nagios_configuration
#Author:楚霏
#Date: 2009-3-19
#Update:2009-8-11
#Env: Centos 5.3 x86_64
#感谢Sery兄的帮助
#################################### 一、准备工作
####################################
环境:Centos 5.3 x86_64
所需软件:
nagios-3.1.?.tar.gz
nagios-plugins-1.4.13.tar.gz
nrpe-2.12.tar.gz
httpd-2.2.??.tar.gz
gcc
glibc
glibc-common
gd
gd-devel
fetion20080910047-lin64.tar.gz
library64_linux.tar.gz
libstdc++-4.3.0-8.x86_64.rpm
####################################
#下载相关软件
cd /usr/local/src/ 二、环境介绍
####################################
两台机器全是Centos 5.3 x86_64
主监控机IP=10.0.0.52
被监控机IP=10.0.0.166
主监控机上运行nagios的用户名是nagios,这个用户隶属于nagios组和运行apache的用户组
主监控机需要安装nagios,nagios-plugins,nrpe,fetion
被监控机只需要安装nagios-plugins,nrpe
支持PHP和GD的WEB环境并不是nagios必需的,主要是为了在web上看到监控状态,而nagios所带的html需要php+gd的支持
所有增减主机增减服务器操作均在主监控机上配置
主监控机上的nagios.cfg是总的配置文件,配置各个部分的配置文件的位置等信息
####################################
三、安装配置
####################################
(1)在主监控机上安装apache+php+gd的web环境,推荐编译安装,不再赘述,本处方便起见用yum装了 yum -y install gcc glibc glibc-common gd gd-devel httpd php php-gd libpng (2)在主监控机上安装Nagios
#创建相关的用户和组useradd -m nagiosgroupadd nagcmd && usermod -a -G nagcmd nagios #下边这条命令是使nagios用户也隶属于运行web服务器的组usermod -a -G nagcmd apache cd /usr/local/src/tar xvf nagios-3.1.?.tar.gz ; cd nagios-3.1.? #可以先看一下编译帮助./configure --help./configure --prefix=/usr/local/nagios --with-command-group=nagcmdmake all #第一步执行make install安装主要的程序、CGI及HTML文件#第二步执行make install-init的步骤,它的作用是把nagios做成一个运行脚本,使nagios随系统开机启动#第三步执行make install-commandmode 给外部命令访问nagios配置文件的权限#第四步执行make install-config 把配置文件的例子复制到nagios的安装目录make installmake install-initmake install-commandmodemake install-config #验证程序是否被正确安装上文指定的安装路径(这里是/usr/local/nagios),看是否存在etc、bin、sbin、share、var这五个目录。#bin 执行程序所在目录,这个目录只有一个文件nagios#etc 配置文件位置,初始安装完后,只有几个*.cfg-sample文件#sbin Nagios Cgi文件所在目录,也就是执行外部命令所需文件所在的目录#share Nagios网页文件所在的目录#var Nagios日志文件、spid 等文件所在的目录ls /usr/local/nagios
(3)配置WEB接口
#相当于httpd.conf中加了 #----------------------------引用文字-开始----------------------------# Load config files from the config directory "/etc/httpd/conf.d".Include conf.d/*.conf#----------------------------引用文字-结束---------------------------- #然后在新建的/安装路径/httpd/conf.d/下新建了一个文件,内容是: #----------------------------引用文字-开始----------------------------# SAMPLE CONFIG SNIPPETS FOR APACHE WEB SERVER# Last Modified: 11-26-2005## This file contains examples of entries that need# to be incorporated into your Apache web server# configuration file. Customize the paths, etc. as# needed to fit your system. ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin" <directory "="" usr="" local="" nagios="" sbin"=""># SSLRequireSSL AuthType Basic Options ExecCGI AllowOverride None Order allow,deny Allow from all# Order deny,allow# Deny from all# Allow from 127.0.0.1 AuthName "Nagios Access" AuthUserFile /usr/local/nagios/etc/htpasswd.users Require valid-user</directory> Alias /nagios "/usr/local/nagios/share" <directory "="" usr="" local="" nagios="" share"=""># SSLRequireSSL AuthType Basic Options None AllowOverride None Order allow,deny Allow from all# Order deny,allow# Deny from all# Allow from 127.0.0.1 AuthName "Nagios Access" AuthUserFile /usr/local/nagios/etc/htpasswd.users Require valid-user</directory>#----------------------------引用文字-结束---------------------------- #yum安装的apache,可用下面命令来实现make install-webconf#生成验证用户,htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin#在httpd.conf中的DirectoryIndex中加上index.php#apache其它配置此处不再废话service httpd start(4)安装Nagios Plugins
cd /usr/local/src/tar xvf nagios-plugins-1.4.??.tar.gz && cd nagios-plugins-1.4.??./configure --with-nagios-user=nagios --with-nagios-group=nagiosmakemake install
(5)把Nagios增加为服务器并试运行
chkconfig --add nagioschkconfig --level 3 nagios on #测试一下配置文件/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg #保证nagios用户有权限运行插件chown -R nagios:nagios /usr/local/nagios/libexec/ #如果没有错误,启动service nagios start
(6)Nagios配置文件简介
#主配置文件nagios.cfg
#日志文件
#格式:log_file=
#例如:
#log_file=/usr/local/nagios/var/nagios.log
#对象的配置文件
#格式:cfg_file=
#例如:
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg
#对象的配置目录
#格式:cfg_dir =
#例如:
#cfg_dir=/usr/local/nagios/etc/switches
#Nagios用户
#格式:nagios_user=
#例如:
#nagios_user = nagios
#配置文件cgi.cfg,它是控制相关cgi脚本的
#objects(对象)是所有可监控和通知的要素。
#下边包含的配置文件主要包括
#hosts.cfg定义被监控主机
#hostgroups.cfg定义被监控主机组
#services.cfg定义服务
#servicegroups.cfg定义服务组
#contacts.cfg定义联系人
#contactgroups.cfg定义联系人组
#timeperiods.cfg定义时间期限-如24×7全天候的监测
#commands.cfg定义命令
#servicedependency定义服务依赖
#serviceescalation定义服务扩展
#hostdependency定义主机依赖
#hostescalation定义主机扩展
(7)修改配置文件
cd /usr/local/nagios/etc/cp nagios.cfg nagios.cfg.chushibakvi nagios.cfg#把下面部分 #----------------------------引用文字-开始----------------------------cfg_file=/usr/local/nagios/etc/objects/commands.cfgcfg_file=/usr/local/nagios/etc/objects/contacts.cfgcfg_file=/usr/local/nagios/etc/objects/timeperiods.cfgcfg_file=/usr/local/nagios/etc/objects/templates.cfg # Definitions for monitoring the local (Linux) hostcfg_file=/usr/local/nagios/etc/objects/localhost.cfg#----------------------------引用文字-结束---------------------------- #修改为#----------------------------引用文字-开始----------------------------cfg_file=/usr/local/nagios/etc/objects/contacts.cfgcfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg cfg_file=/usr/local/nagios/etc/objects/services.cfgcfg_file=/usr/local/nagios/etc/objects/servicegroups.cfg cfg_file=/usr/local/nagios/etc/objects/commands.cfgcfg_file=/usr/local/nagios/etc/objects/timeperiods.cfgcfg_file=/usr/local/nagios/etc/objects/templates.cfg # Definitions for monitoring the local (Linux) hostcfg_file=/usr/local/nagios/etc/objects/hosts.cfgcfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg#----------------------------引用文字-结束----------------------------(8)创建和修改对象配置文件
cd /usr/local/nagios/etc/objectsmkdir bakmv contacts.cfg ./bak/mv localhost.cfg ./bak/ cat << EOF >> hosts.cfg#----------------------------引用文字-开始----------------------------define host{ host_name 10.0.0.52 alias 10.0.0.52 address 10.0.0.52 max_check_attempts 5 #check_interval 1 #retry_interval 1 check_period 24x7 contact_groups sa_groups notification_interval 30 #first_notification_delay # notification_period 24x7 notification_options d,u,r } define host{ host_name 10.0.0.166 alias 10.0.0.166 address 10.0.0.166 max_check_attempts 5 #check_interval 1 #retry_interval 1 check_period 24x7 contact_groups sa_groups notification_interval 30 #first_notification_delay # notification_period 24x7 notification_options d,u,r }EOF#----------------------------引用文字-结束---------------------------- cat << EOF >> hostgroups.cfg#----------------------------引用文字-开始----------------------------define hostgroup{ hostgroup_name all_hosts alias all_hosts members 10.0.0.52,10.0.0.166 #notes note_string #notes_url url #action_url url }define hostgroup{ hostgroup_name http_hosts alias http_hosts members 10.0.0.166 #notes note_string #notes_url url #action_url url }EOF#----------------------------引用文字-结束---------------------------- cat << EOF >> contacts.cfg#----------------------------引用文字-开始----------------------------define contact{ contact_name cheng alias sa_cheng host_notifications_enabled 1 [0/1] service_notifications_enabled 1 [0/1] host_notification_period 24x7 service_notification_period 24x7 host_notification_options d,u,r service_notification_options w,u,c,r host_notification_commands notify-service-by-email,notify-service-by-sms service_notification_commands notify-host-by-email,notify-host-by-sms email yxcx@yahoo.cn<script cf-hash="f9e31" type="text/javascript">/* <![CDATA[ */!function(){try{var t="currentScript"in document?document.currentScript:function(){for(var t=document.getElementsByTagName("script"),e=t.length;e--;)if(t[e].getAttribute("cf-hash"))return t[e]}();if(t&&t.previousSibling){var e,r,n,i,c=t.previousSibling,a=c.getAttribute("data-cfemail");if(a){for(e="",r=parseInt(a.substr(0,2),16),n=2;a.length-n;n+=2)i=parseInt(a.substr(n,2),16)^r,e+=String.fromCharCode(i);e=document.createTextNode(e),c.parentNode.replaceChild(e,c)}}}catch(u){}}();/* ]]> */</script> pager 13712345678 can_submit_commands 1 [0/1] #retain_status_information [0/1] #retain_nonstatus_information [0/1] }EOF#----------------------------引用文字-结束---------------------------- cat << EOF >> contactgroups.cfg#----------------------------引用文字-开始----------------------------define contactgroup{ contactgroup_name sa_groups alias sa_groups members cheng #contactgroup_members contactgroups }EOF#----------------------------引用文字-结束---------------------------- #下边检查调用的命令(check_command),在命令配置文件中定义或在nrpe配置文件中要有定义#最大重试次数(max_check_attempts),一般设置为3-4次比较好,这样不会因为太敏感而发生误报,一丢包就发短信太崩溃了吧#检查间隔(check_interval)和重试检查间隔(retry_interval)的单位是分钟,不同的检查项目酌情修改#通知间隔(notification_interval)指探测到故障以后,每隔多少分钟发送一次报警信息。#状态级别:#d=send notifications on a DOWN state宕#w=send notifications on a WARNING state警告状态#c=send notifications on a CRITICAL state严重状态、临界状态#u=send notifications on an UNREACHABLE or UNKNOWN state找不到、不可达#r=send notifications on recoveries (OK state)OK状态#f=send notifications when the host or service starts and stops flapping#s=send notifications when scheduled downtime starts and ends cat << EOF >> services.cfg#----------------------------引用文字-开始----------------------------#monitor hostsdefine service{ host_name 10.0.0.166 service_description check_ftp check_command check_ftp max_check_attempts 3 check_interval 10 retry_interval 5 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups }EOF#----------------------------引用文字-结束---------------------------- cat << EOF >> servicegroups.cfg#----------------------------引用文字-开始----------------------------#monitor all_hostsdefine service{ hostgroup_name all_hosts service_description check_host-alive check_command check_ping max_check_attempts 5 check_interval 3 retry_interval 1 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups }define service{ hostgroup_name all_hosts service_description check_df check_command check_nrpe!check_df max_check_attempts 4 check_interval 1440 retry_interval 5 check_period 24x7 notification_interval 1440 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups }define service{ hostgroup_name all_hosts service_description check_load check_command check_nrpe!check_load max_check_attempts 5 check_interval 5 retry_interval 5 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups }define service{ hostgroup_name all_hosts service_description check_zombie_procs check_command check_nrpe!check_zombie_procs max_check_attempts 5 check_interval 5 retry_interval 5 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups }define service{ hostgroup_name all_hosts service_description check_total_procs check_command check_nrpe!check_total_procs max_check_attempts 5 check_interval 5 retry_interval 5 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups }define service{ hostgroup_name all_hosts service_description check_ssh check_command check_ssh max_check_attempts 3 check_interval 60 retry_interval 5 check_period 24x7 notification_interval 60 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups } #monitor http_hostsdefine service{ hostgroup_name http_hosts service_description check_http check_command check_http max_check_attempts 4 check_interval 3 retry_interval 1 check_period 24x7 notification_interval 30 notification_period 24x7 notification_options w,u,c #contacts contacts(*) contact_groups sa_groups }EOF#----------------------------引用文字-结束----------------------------
####################################
####################################
(7)主监控机安装nrpe
cd /usr/local/src/tar xvf nrpe-2.??.tar.gz && cd nrpe-2.??./configure --prefix=/usr/local/nrpe #编译结束后在屏幕打印出相关的一些系统信息#----------------------------引用文字-开始----------------------------General Options: ------------------------- NRPE port: 5666 NRPE user: nagios NRPE group: nagios Nagios user: nagios Nagios group: nagios#----------------------------引用文字-结束----------------------------makemake install #复制几个插件以便nrpe正常工作cp /usr/local/nrpe/libexec/check_nrpe /usr/local/nagios/libexec/cp /usr/local/nagios/libexec/check_disk /usr/local/nrpe/libexec/cp /usr/local/nagios/libexec/check_load /usr/local/nrpe/libexec/cp /usr/local/nagios/libexec/check_ping /usr/local/nrpe/libexec/cp /usr/local/nagios/libexec/check_procs /usr/local/nrpe/libexec/chown -R nagios:nagios /usr/local/nrpe/libexec/ #在/usr/local/nagios/etc/objects/commands.cfg中适当位置加入下面内容,我加在check_ssh和check_dhcp中间了vi /usr/local/nagios/etc/objects/commands.cfg#----------------------------引用文字-开始----------------------------# 'check_nrpe' command definitiondefine command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ }#----------------------------引用文字-结束----------------------------####################################
####################################
(8)配置nrpe
mkdir /usr/local/nrpe/etccp sample-config/nrpe.cfg /usr/local/nrpe/etc/ #修改下边的几个选项#server_address=按实际情况修改#allowed_hosts=允许被哪些机器监控#----------------------------引用文字-开始----------------------------server_address=127.0.0.1allowed_hosts=127.0.0.1#----------------------------引用文字-结束---------------------------- #命令部分根据实际情况调整,比如硬盘,此处我注释了check_hda1命令,改为全部硬盘#----------------------------引用文字-开始----------------------------#command[check_hda1]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10% -p /dev/hda1command[check_df]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10%#----------------------------引用文字-结束---------------------------- #把nrpe增加为服务cp init-script /etc/init.d/nrpechmod 755 /etc/init.d/nrpechkconfig --add nrpechkconfig --level 3 nrpe on
####################################
####################################
(9)安装飞信机器人
cd /usr/local/src/rpm -Uvh libstdc++-4.3.0-8.x86_64.rpmtar xvf fetion20090406003-linux.tar.gztar xvf library_linux.tar.gzmv install ../smsmv libACE* /usr/local/lib64/mv libcrypto.so.0.9.8 /usr/local/lib64/mv libssl.so.0.9.8 /usr/local/lib64/echo "/usr/local/lib64/" >> /etc/ld.so.confldconfigchown -R nagios:nagios /usr/local/smschmod 755 /usr/local/sms/fetion #最好能切换到nagios发短信测试一下su nagios#13744444444发短信所用的手机号#jiubugaosuni为13744444444密码#13712345678改为你自己的手机号/usr/local/sms/fetion --mobile=13744444444 --pwd=jiubugaosuni --to=13712345678 --msg-utf8=test#别忘了回到root用户exit #加入短信报警的命令,我加在email部分下边了vi commands.cfg#----------------------------引用文字-开始----------------------------# 'notify-host-by-sms' command definitiondefine command{ command_name notify-host-by-sms command_line /usr/local/sms/fetion --mobile=13744444444 --pwd=jiubugaosuni --to=$CONTACTPAGER$ --msg-utf8="$NOTIFICATIONTYPE$ $HOSTNAME$ $SERVICEDESC$ is $SERVICESTATE$ info: $SERVICEOUTPUT$" }# 'notify-service-by-sms' command definition define command{ command_name notify-service-by-sms command_line /usr/local/sms/fetion --mobile=13744444444 --pwd=jiubugaosuni --to=$CONTACTPAGER$ --msg-utf8="$NOTIFICATIONTYPE$: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" }#----------------------------引用文字-结束---------------------------- #修改contacts.cfg和contactgroups.cfg相关信息,主要是手机号####################################
####################################
(10)重启nagios服务,验证对主监控机本身的监控情况
#测试一下配置文件,看是否有错误输出/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfgservice nagios restart#用浏览器打开http://ip/nagios/看一下情况
####################################
####################################
(11)在被监控机上安装nagios-plugins和nrpe
useradd -m nagioscd /usr/local/src/tar xvf nagios-plugins-1.4.13.tar.gz cd nagios-plugins-1.4.13./configure --with-nagios-user=nagios --with-nagios-group=nagiosmakemake installcd ../tar xvf nrpe-2.12.tar.gz cd nrpe-2.12./configuremakemake installmkdir /usr/local/nagios/etc/cp sample-config/nrpe.cfg /usr/local/nagios/etc/ #修改/usr/local/nagios/etc/nrpe.cfg下边的几个选项#server_address=按实际情况修改#allowed_hosts=允许被哪些机器监控#----------------------------引用文字-开始----------------------------server_address=10.0.0.166allowed_hosts=127.0.0.1,10.0.0.52,10.0.0.166#----------------------------引用文字-结束----------------------------#命令部分根据实际情况调整,比如硬盘,此处我注释了check_hda1命令,改为全部硬盘#----------------------------引用文字-开始----------------------------#command[check_hda1]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10% -p /dev/hda1command[check_df]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10%#----------------------------引用文字-结束----------------------------cp init-script /etc/init.d/nrpechmod 755 /etc/init.d/nrpechkconfig --add nrpechkconfig --level 3 nrpe on####################################
####################################
(12)如何添加一台被监控机
#步骤:
#a.保证被监控机已经正确安装nagios-plugins和nrpe
#b.在hosts.cfg定义这台被监控机。把主机定义这部分复制粘贴后稍做修改即可
#c.在hostgroups.cfg定义这台机器应该属于哪些组
#d.需要监控的服务未在servicegroups被定义时在services.cfg中定义
####################################
####################################
(13)监控一台mysql服务器需注意
#编译nagios-plugins时需要加上--with-mysql=/usr/local/mysql(你的mysql安装路径)#./configure --with-mysql=/usr/local/mysql --with-nagios-user=nagios --with-nagios-group=nagios #在被监控机上做相关操作#实际是以一个只有查询权限的用户nrpe来查询一个空数据库nrpe。功能等于mysqladmin -u 用户 --password='密码' status -i 2mysql -p#----------------------------引用文字-开始----------------------------mysql> create database nrpe;mysql> grant select on nrpe.* to nrpe@localhost identified by 'password' with grant option;mysql> grant select on nrpe.* to nrpe@主监控机ip identified by 'password' with grant option;#----------------------------引用文字-结束----------------------------#试运行,会输出mysql运行情况/usr/local/nagios/libexec/check_mysql -u nrpe -d nrpe#在监控机所在的服务器上试运行(需要mysql_client)/usr/local/nagios/libexec/check_mysql -H 10.0.0.166 -u nrpe -d nrpe
####################################
####################################
(14)监控一台web服务器时,可以采用nrpe来监控
#在主监控机的services.cfg中如需调用check_http命令的改为调用check_nrpe!check_http#在被监控机中的nrpe.cfg中加下条#----------------------------引用文字-开始----------------------------command[check_http]=/usr/local/nagios/libexec/check_http -H www.chengyongxu.com -u /index.php#----------------------------引用文字-结束----------------------------#也就是说访问这台web服务器上的一个页面,这个页面正常说明web服务正常摘自: Nagios搭建监控服务器
|