Monitoring Load Average

November 14, 2008

Probably the most general indication that something’s wrong with a server is an unusually high load average. Typically this will be a stuck DirectoryService or httpd or imapd process which is then slowing everything down. Luckily it’s pretty simple to keep tabs on load average, for example by using the following script:

MAILTO="filipp@mydomain.tld"
COOKIE=/private/tmp/loadcheck
LOAD=$(uptime | awk {'print $11'} | sed 's/,/./')
SUBJECT="$(hostname) is under heavy load!"

if [[ -e $COOKIE ]]; then
  exit 0
fi

BODY="$(ps -rax | head -n 4)"

if [[ $(echo "print $LOAD > 1" | bc) -eq 1 ]]; then
    echo $BODY | mail -s "$SUBJECT" $MAILTO
    touch $COOKIE
fi

exit 0

This will send the email only once and you’ll have to delete the cookie to reset it.

Combine that with a launchd.plist running it, say every 60 seconds, you should be able to spot problems before your users do.