Using Monit

Monit is a daemon process monitoring tool that can be used on embedded devices or servers to monitor gateway agents. The example below describes installation, setup and running a process and network monitor on a beaglebone. The process monitoring can watch resources like memory usage, CPU usage, thread children and CPU time to make sure everything is operating normally. We also monitor the main XMPP server in case that is also down.

Beaglebone

Installation

Run as root:

apt-get install monit
apt-get install ssmtp mailutils mpack

Configure

Configuration is done in the monitrc file which is located in the following directory:

/etc/monit

Logs are stored in:
 /var/log/monit.log

Enable the web server configuration tool by uncommenting the following lines in monitrc:

set httpd port 2812 and
#    use address localhost  # only accept connection from localhost
#    allow localhost        # allow localhost to connect to the server and
allow admin:monit      # require user 'admin' with password 'monit'
#    allow @monit           # allow users of group 'monit' to connect (rw)
#    allow @users readonly  # allow users of group 'users' to connect readonly

Enable Mail alerts in monitrc:

set mailserver localhost 

set alert admin@someaddress.com              

Setup email on beaglebone, edit /etc/ssmtp/ssmtp.conf:

FromLineOverride=YES
mailhub=smtp.gmail.com:587
AuthUser=emailaccount@gmail.com
AuthPass=yourpassword
useSTARTTLS=YES

Example Monit Configuration that goes into monitrc:

  check process SLIPstream
    matching "SLIPstream" 
    start program = "/bin/sh /root/start_SLIP.sh" with timeout 10 seconds
    stop program  = "/bin/sh /root/stop_SLIP.sh" with timeout 10 seconds
    if cpu > 60% for 2 cycles then alert
    if cpu > 80% for 5 cycles then restart
    if totalmem > 100.0 MB for 5 cycles then restart
    if children > 10 then restart
    if loadavg(5min) greater than 10 for 8 cycles then restart 

  check process pymio_gateway
    matching "python2.7 ./MioGateway.py" 
    start program = "/bin/sh /root/start_aloha.sh" with timeout 10 seconds
    stop program  = "/bin/sh /root/stop_aloha.sh" with timeout 10 seconds
    if cpu > 60% for 2 cycles then alert
    if cpu > 80% for 5 cycles then restart
    if totalmem > 100.0 MB for 5 cycles then restart
    if children > 10 then restart
    if loadavg(5min) greater than 10 for 8 cycles then restart 

  check host sensor.andrew.cmu.edu with address sensor.andrew.cmu.edu
    if failed icmp type echo count 3 with timeout 30 seconds then exec "/bin/sh /root/stop_aloha.sh" 

It appears that restart simply calls the "start" program, so make sure to add the stop script in the front of the start script.

Running

Test mail by running:

echo "This is the message body" | mail -s "This is the subject" mail@example.com

You can start and stop monit with:

service monit start

or

service monit stop

Checking status:

root@beaglebone:/var/log# monit status
The Monit daemon 5.4 uptime: 0m 

Process 'SLIPstream'
  status                            Running
  monitoring status                 Monitored
  pid                               18858
  parent pid                        18846
  uptime                            0m 
  children                          0
  memory kilobytes                  452
  memory kilobytes total            452
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Mon, 16 Jun 2014 17:18:07

Process 'mio_gateway'
  status                            Running
  monitoring status                 Monitored
  pid                               18859
  parent pid                        18851
  uptime                            0m 
  children                          0
  memory kilobytes                  2264
  memory kilobytes total            2264
  memory percent                    0.4%
  memory percent total              0.4%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Mon, 16 Jun 2014 17:18:07

System 'system_beaglebone'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.18] [0.08] [0.11]
  cpu                               0.0%us 0.0%sy 0.0%wa
  memory usage                      141900 kB [27.8%]
  swap usage                        0 kB [0.0%]
  data collected                    Mon, 16 Jun 2014 17:18:07