I have been using self hosted Kolab Groupware everyday for quite a while now.
Therefore the need arose to monitor process activity and system resources using Monit utility.
Table of contents
Couple of words about monit
monit is a simple and robust utility for monitoring and automatic maintenance, which is supported on Linux, BSD and OS X.
Software installation
Debian Wheezy currently provides Monit 5.4.
To install it execute command:
$ sudo apt-get install monit
Monit daemon will be started at the boot time. Alternatively you can use standard System V init scripts to manage service.
Initial configuration
Configuration files are located under /etc/monit/
directory. Default settings are stored in the /etc/monit/monitrc
file, which I strongly suggest to read.
Custom configuration will be stored in the/etc/monit/conf.d/
directory.
I will override several important settings using local.conf
file.
Modified settings
- Set email address to
root@example.org
- Slightly change default template
- Define mail server as
localhost
- Set default interval to
120
seconds with initial delay of180
seconds - Enable local web server to take advantage of the additional functionality
(currently commented out)
$ sudo cat /etc/monit/conf.d/local.conf
# define e-mail recipent set alert root@example.org # define e-mail template set mail-format { from: monit@$HOST subject: monit alert -- $EVENT $SERVICE message: $EVENT Service $SERVICE Date: $DATE Action: $ACTION Host: $HOST Description: $DESCRIPTION } # define server set mailserver localhost # define interval and initial delay set daemon 120 with start delay 180 # set web server for local management # set httpd port 2812 and use the address localhost allow localhost
Command-line operations
Verify configuration syntax
To check configuration syntax execute the following command.
$ sudo monit -t Control file syntax OK
Start, Stop, Restart actions
Start all services and enable monitoring for them.
$ sudo monit start all
Start all services in resources
group and enable monitoring for them.
$ sudo monit -g resources start
Start rootfs
service and enable monitoring for it.
$ sudo monit start rootfs
You can initiate stop
action in the same way as the above one, which will stop service and disable monitoring, or just execute restart
action to stop and start corresponding services.
Monitor and unmonitor actions
Monitor all services.
$ sudo monit monitor all
Monitor all services in resources
group.
$ sudo monit -g resources monitor
Monitor rootfs
service.
$ sudo monit monitor rootfs
Use unmonitor
action to disable monitoring for corresponding services.
Status action
Print service status.
$ sudo monit status
The Monit daemon 5.6 uptime: 27d 0h 47m System 'server' status Running monitoring status Monitored load average [0.26] [0.43] [0.48] cpu 12.8%us 2.6%sy 0.0%wa memory usage 2934772 kB [36.4%] swap usage 2897376 kB [35.0%] data collected Mon, 29 Sep 2014 22:47:49 Filesystem 'rootfs' status Accessible monitoring status Monitored permission 660 uid 0 gid 6 filesystem flags 0x1000 block size 4096 B blocks total 17161862 [67038.5 MB] blocks free for non superuser 7327797 [28624.2 MB] [42.7%] blocks free total 8205352 [32052.2 MB] [47.8%] inodes total 4374528 inodes free 4151728 [94.9%] data collected Mon, 29 Sep 2014 22:47:49
Summary action
Print short service summary.
$ sudo monit summary The Monit daemon 5.6 uptime: 27d 0h 48m System 'server' Running Filesystem 'rootfs' Accessible
Reload action
Reload configuration and reinitialize Monit daemon.
$ sudo monit reload
Quit action
Terminate Monit daemon.
$ sudo monit quit monit daemon with pid [5248] killed
Monitor filesystems
I am using VPS service due to easy backup/restore process, so I have only one filesystem on /dev/root
device, which I will monitor as a named rootfs
service.
Monit daemon will generate alert and send an email if space or inode usage on the rootfs
filesystem [stored on /dev/root
device] exceeds 80 percent of the available capacity.
$ sudo cat /etc/monit/conf.d/filesystems.conf
check filesystem rootfs with path /dev/root group resources if space usage > 80% then alert if inode usage > 80% then alert
The above service is placed in resources
group for easier management.
Monitor system resources
The following configuration will be stored as a named server
service as it describes resource usage for the whole mail server.
Monit daemon will check memory usage, if it exceeds 80% of the available capacity for three subsequent events, it will send an alert email.
Recovery message will be sent after two subsequent events to limit number of sent messages. The same rules apply to the remaining system resources.
The system I am using have four available processors, so the alert will be generated after the five minutes load average exceeds five.
$ sudo cat /etc/monit/conf.d/resources.conf
check system server group resources if memory usage > 80% for 3 cycles then alert else if succeeded for 2 cycles then alert if swap usage > 50% for 3 cycles then alert else if succeeded for 2 cycles then alert if cpu(wait) > 30% for 3 cycles then alert else if succeeded for 2 cycles then alert if cpu(system) > 60% for 3 cycles then alert else if succeeded for 2 cycles then alert if cpu(user) > 60% for 3 cycles then alert else if succeeded for 2 cycles then alert if loadavg(5min) > 5 then alert else if succeeded for 2 cycles then alert
The above service is placed in resources
group for easier management.
Monitor system services
cron
cron is a daemon used to execute user-specified tasks at scheduled time.
Monit daemon will use the specified pid file [/var/run/crond.pid
] to monitor [cron
] service and restart it if it stops for any reason.
Configuration change will generate alert message, permission issue will generate alert message and disable further monitoring.
GID of 102
translates to crontab
group.
$ sudo cat /etc/monit/conf.d/cron.conf
check process cron with pidfile /var/run/crond.pid group system group scheduled-tasks start program = "/usr/sbin/service cron start" stop program = "/usr/sbin/service cron stop" if 3 restarts within 5 cycles then timeout depends on cron_bin depends on cron_rc depends on cron_rc.d depends on cron_rc.daily depends on cron_rc.hourly depends on cron_rc.monthly depends on cron_rc.weekly depends on cron_rc.spool check file cron_bin with path /usr/sbin/cron group scheduled-tasks if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file cron_rc with path /etc/crontab group scheduled-tasks if failed checksum then alert if failed permission 644 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory cron_rc.d with path /etc/cron.d group scheduled-tasks if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory cron_rc.daily with path /etc/cron.daily group scheduled-tasks if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory cron_rc.hourly with path /etc/cron.hourly group scheduled-tasks if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory cron_rc.monthly with path /etc/cron.monthly group scheduled-tasks if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory cron_rc.weekly with path /etc/cron.weekly group scheduled-tasks if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory cron_rc.spool with path /var/spool/cron/crontabs group scheduled-tasks if changed timestamp then alert if failed permission 1730 then unmonitor if failed uid root then unmonitor if failed gid 102 then unmonitor
The above service is placed in system
and scheduled-tasks
groups for easier management.
rsyslogd
rsyslogd is a message logging service.
$ sudo cat /etc/monit/conf.d/rsyslogd.conf
check process rsyslog with pidfile /var/run/rsyslogd.pid group system group logging start program = "/usr/sbin/service rsyslog start" stop program = "/usr/sbin/service rsyslog stop" if 3 restarts within 5 cycles then timeout depends on rsyslog_bin depends on rsyslog_rc depends on rsyslog_rc.d check file rsyslog_bin with path /usr/sbin/rsyslogd group logging if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file rsyslog_rc with path /etc/rsyslog.conf group logging if failed checksum then alert if failed permission 644 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory rsyslog_rc.d with path /etc/rsyslog.d group logging if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in system
and logging
groups for easier management.
ntpd
Network Time Protocol daemon will be extended by the use of port monitoring.
$ sudo cat /etc/monit/conf.d/ntpd.conf
check process ntp with pidfile /var/run/ntpd.pid group system group time start program = "/usr/sbin/service ntp start" stop program = "/usr/sbin/service ntp stop" if failed port 123 type udp then restart if 3 restarts within 5 cycles then timeout depends on ntp_bin depends on ntp_rc check file ntp_bin with path /usr/sbin/ntpd group time if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file ntp_rc with path /etc/ntp.conf group time if failed checksum then alert if failed permission 644 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in system
and time
groups for easier management.
OpenSSH
OpenSSH service will be extended by the use of match
statement to test content of the configuration file. I assume it is self explanatory.
$ sudo cat /etc/monit/conf.d/openssh-server.conf
check process openssh with pidfile /var/run/sshd.pid group system group sshd start program = "/usr/sbin/service ssh start" stop program = "/usr/sbin/service ssh stop" if failed port 22 with proto ssh then restart if 3 restarts with 5 cycles then timeout depend on openssh_bin depend on openssh_sftp_bin depend on openssh_rsa_key depend on openssh_dsa_key depend on openssh_rc check file openssh_bin with path /usr/sbin/sshd group sshd if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file openssh_sftp_bin with path /usr/lib/openssh/sftp-server group sshd if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file openssh_rsa_key with path /etc/ssh/ssh_host_rsa_key group sshd if failed checksum then unmonitor if failed permission 600 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file openssh_dsa_key with path /etc/ssh/ssh_host_dsa_key group sshd if failed checksum then unmonitor if failed permission 600 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file openssh_rc with path /etc/ssh/sshd_config group sshd if failed checksum then alert if failed permission 644 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor if not match "^PasswordAuthentication no" then alert if not match "^PubkeyAuthentication yes" then alert if not match "^PermitRootLogin no" then alert
The above service is placed in system
and sshd
groups for easier management.
Monitor Kolab services
MySQL
MySQL is an open-source database server used by the wide range of Kolab services.
UID of 106
translates to mysql
user. GID of 106
translates to mysql
group.
It is the first time I have used unixsocket
statement here.
$ sudo cat /etc/monit/conf.d/mysql.conf
check process mysql with pidfile /var/run/mysqld/mysqld.pid group kolab group database start program = "/usr/sbin/service mysql start" stop program = "/usr/sbin/service mysql stop" if failed port 3306 protocol mysql then restart if failed unixsocket /var/run/mysqld/mysqld.sock protocol mysql then restart if 3 restarts within 5 cycles then timeout depends on mysql_bin depends on mysql_rc depends on mysql_sys_maint depend on mysql_data check file mysql_bin with path /usr/sbin/mysqld group database if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file mysql_rc with path /etc/mysql/my.cnf group database if failed checksum then alert if failed permission 644 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file mysql_sys_maint with path /etc/mysql/debian.cnf group database if failed checksum then unmonitor if failed permission 600 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory mysql_data with path /var/lib/mysql group database if failed permission 700 then unmonitor if failed uid 106 then unmonitor if failed gid 110 then unmonitor
The above service is placed in kolab
and database
groups for easier management.
Apache
Apache is an open-source HTTP server used to serve user/admin web-interface.
Please notice that I am checking HTTPS port.
$ sudo cat /etc/monit/conf.d/apache.conf
check process apache with pidfile /var/run/apache2.pid group kolab group web-server start program = "/usr/sbin/service apache2 start" stop program = "/usr/sbin/service apache2 stop" if failed port 443 then restart if 3 restarts within 5 cycles then timeout depends on apache2_bin depends on apache2_rc depends on apache2_rc_mods depends on apache2_rc_sites check file apache2_bin with path /usr/sbin/apache2.prefork group web-server if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory apache2_rc with path /etc/apache2 group web-server if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory apache2_rc_mods with path /etc/apache2/mods-enabled group web-server if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory apache2_rc_sites with path /etc/apache2/sites-enabled group web-server if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and web-server
groups for easier management.
Kolab daemon
This is the heart of the whole Kolab unified communication and collaboration system as it is responsible for data synchronization between different services.
UID of 413
translates to kolab-n
user. GID of 412
translates to kolab
group.
$ sudo cat /etc/monit/conf.d/kolab-server.conf
check process kolab-server with pidfile /var/run/kolabd/kolabd.pid group kolab group kolab-daemon start program = "/usr/sbin/service kolab-server start" stop program = "/usr/sbin/service kolab-server stop" if 3 restarts within 5 cycles then timeout depends on kolab-daemon_bin depends on kolab-daemon_rc check file kolab-daemon_bin with path /usr/sbin/kolabd group kolab-daemon if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file kolab-daemon_rc with path /etc/kolab/kolab.conf group kolab-daemon if failed checksum then alert if failed permission 640 then unmonitor if failed uid 413 then unmonitor if failed gid 412 then unmonitor
The above service is placed in kolab
and kolab-daemon
groups for easier management.
Kolab saslauthd
Kolab saslauthd is the SASL authentication daemon for multi-domain Kolab deployments.
$ sudo cat /etc/monit/conf.d/kolab-saslauthd.conf
check process kolab-saslauthd with pidfile /var/run/kolab-saslauthd/kolab-saslauthd.pid group kolab group kolab-saslauthd start program = "/usr/sbin/service kolab-saslauthd start" stop program = "/usr/sbin/service kolab-saslauthd stop" if 3 restarts within 5 cycles then timeout depends on kolab-saslauthd_bin check file kolab-saslauthd_bin with path /usr/sbin/kolab-saslauthd group kolab-saslauthd if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and kolab-saslauthd
groups for easier management.
/var/run/saslauthd/mux
socket, but just leave it alone for now.
Wallace
The Wallace is a content filtering daemon.
$ sudo cat /etc/monit/conf.d/wallace.conf
check process wallace with pidfile /var/run/wallaced/wallaced.pid group kolab group wallace start program = "/usr/sbin/service wallace start" stop program = "/usr/sbin/service wallace stop" #if failed port 10026 then restart if 3 restarts within 5 cycles then timeout depends on wallace_bin check file wallace_bin with path /usr/sbin/wallaced group wallace if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and wallace
groups for easier management.
ClamAV
The ClamAV daemon is an open-source, cross-platform antivirus software.
$ sudo cat /etc/monit/conf.d/clamav.conf
check process clamav with pidfile /var/run/clamav/clamd.pid group system group antivirus start program = "/usr/sbin/service clamav-daemon start" stop program = "/usr/sbin/service clamav-daemon stop" if 3 restarts within 5 cycles then timeout #if failed unixsocket /var/run/clamav/clamd.ctl type udp then alert depends on clamav_bin depends on clamav_rc check file clamav_bin with path /usr/sbin/clamd group antivirus if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file clamav_rc with path /etc/clamav/clamd.conf group antivirus if failed permission 644 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and antivirus
groups for easier management.
Freshclam
Freshclam is a software used to periodically update ClamAV virus databases.
$ sudo cat /etc/monit/conf.d/freshclam.conf
check process freshclam with pidfile /var/run/clamav/freshclam.pid group system group antivirus-updater start program = "/usr/sbin/service clamav-freshclam start" stop program = "/usr/sbin/service clamav-freshclam stop" if 3 restarts within 5 cycles then timeout depends on freshclam_bin depends on freshclam_rc check file freshclam_bin with path /usr/bin/freshclam group antivirus-updater if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file freshclam_rc with path /etc/clamav/freshclam.conf group antivirus-updater if failed permission 444 then unmonitor if failed uid 110 then unmonitor if failed gid 4 then unmonitor
The above service is placed in kolab
and antivirus-updater
groups for easier management.
amavisd-new
Amavis is a high-performance interface between Postfix mail server and content filtering services: SpamAssassin as a spam classifier and ClamAV as an antivirus protection.
$ sudo cat /etc/monit/conf.d/amavisd-new.conf
check process amavisd-new with pidfile /var/run/amavis/amavisd.pid group kolab group content-filter start program = "/usr/sbin/service amavis start" stop program = "/usr/sbin/service amavis stop" if 3 restarts within 5 cycles then timeout #if failed port 10024 type tcp then restart #if failed unixsocket /var/lib/amavis/amavisd.sock type udp then alert depends on amavisd-new_bin depends on amavisd-new_rc check file amavisd-new_bin with path /usr/sbin/amavisd-new group content-filter if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory amavisd-new_rc with path /etc/amavis/ group content-filter if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and content-filter
groups for easier management.
The main Directory Server daemon
The main Directory Server daemon is a 389 LDAP Directory Server.
$ sudo cat /etc/monit/conf.d/dirsrv.conf
check process dirsrv with pidfile /var/run/dirsrv/slapd-xmail.stats group kolab group dirsrv start program = "/usr/sbin/service dirsrv start" stop program = "/usr/sbin/service dirsrv stop" if 3 restarts within 5 cycles then timeout if failed port 389 type tcp then restart depends on dirsrv_bin depends on dirsrv_rc check file dirsrv_bin with path /usr/sbin/ns-slapd group dirsrv if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory dirsrv_rc with path /etc/dirsrv/ group dirsrv if changed timestamp then alert
The above service is placed in kolab
and dirsrv
groups for easier management.
SpamAssasin
SpamAssasin is a content filter used for spam filtering.
$ sudo cat /etc/monit/conf.d/spamd.conf
check process spamd with pidfile /var/run/spamd.pid group system group spamd start program = "/usr/sbin/service spamassassin start" stop program = "/usr/sbin/service spamassassin stop" if 3 restarts within 5 cycles then timeout #if failed port 783 type tcp then restart depends on spamd_bin depends on spamd_rc check file spamd_bin with path /usr/sbin/spamd group spamd if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory spamd_rc with path /etc/spamassassin/ group spamd if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and spamd
groups for easier management.
Cyrus IMAP/POP3 daemons
cyrus-imapd daemon is responsible for IMAP/POP3 communication.
$ sudo cat /etc/monit/conf.d/cyrus-imapd.conf
check process cyrus-imapd with pidfile /var/run/cyrus-master.pid group kolab group cyrus-imapd start program = "/usr/sbin/service cyrus-imapd start" stop program = "/usr/sbin/service cyrus-imapd stop" if 3 restarts within 5 cycles then timeout if failed port 143 type tcp then restart if failed port 4190 type tcp then restart if failed port 993 type tcp then restart depends on cyrus-imapd_bin depends on cyrus-imapd_rc check file cyrus-imapd_bin with path /usr/lib/cyrus-imapd/cyrus-master group cyrus-imapd if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check file freshclam_rc with path /etc/cyrus.conf group anti-virus if failed checksum then alert if failed permission 644 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and cyrus-imapd
groups for easier management.
Postfix
Postfix is an open-source mail transfer agent used to route and deliver electronic mail.
$ sudo cat /etc/monit/conf.d/postfix.conf
check process postfix with pidfile /var/run/cyrus-master.pid group kolab group mta start program = "/usr/sbin/service postfix start" stop program = "/usr/sbin/service postfix stop" if 3 restarts within 5 cycles then timeout if failed port 25 type tcp then restart #if failed port 10025 type tcp then restart #if failed port 10027 type tcp then restart if failed port 587 type tcp then restart depends on postfix_bin depends on postfix_rc check file postfix_bin with path /usr/lib/postfix/master group mta if failed checksum then unmonitor if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor check directory postfix_rc with path /etc/postfix/ group mta if changed timestamp then alert if failed permission 755 then unmonitor if failed uid root then unmonitor if failed gid root then unmonitor
The above service is placed in kolab
and mta
groups for easier management.
Ending notes
This blog post is definitely too long, so I will just mention that similar configuration can be used to monitor other integrated solutions like ISPConfig, or custom specialized setups.
In my opinion Monit is a great utility which simplifies system and service monitoring. Additionally it provides interesting proactive features, like service restart, or arbitrary program execution on selected tests.
Everything is described in the manual page.
$ man monit