Charting spam

November 9, 2006

This actually did make it to, but I’m putting it up here also in the hopes that it might come in useful for someone else too:

One way to train the spam filter that comes with OS X Server (10.4) is by setting up two accounts - “junkmail” and “notjunkmail” and redirecting all spam and false positives to them accordingly. This is all documented on page 52 of the Mail Service manual. Since users’ Mail clients are usually quite well trained, I also instruct them to create a rule to do just that for all the email their client considers spam, but hasn’t been tagged as such by the server.

The manual also mentions that the redirected emails are analysed every night at 1 AM after which they should be discarded. To automate that, all we have to do is add the correct ipurge command to the crontab (I use /etc/crontab here but normally you would just edit cyrusimap’s crontab).



# min   hour    mday    month   wday    who     command

30      01      *       *       *       cyrusimap       ipurge -f -d 1 user/junkmail user/notjunkmail

I think these simple steps can go a long way in battling spam in a small business environment. One thing that’s missing though, is any kind of overview of how much junk-mail we’re actually processing. Preferably with some-sort of graphical representation. The MAILTO variable at the beginning of the crontab means that all the output of the ipurge command will be sent to the given address, usually the “postmaster” alias. This means we have all the necessary data and can generate the statistics on a remote machine.

I’ve chosen (what I think is) the most straight-forward approach by using AWK to generate a (partial) HTML file that displays the date of the processing, number of messages numerically and graphically and finally the total amount of messages. Although crude, this technique is very easy to use and doesn’t depend on any extra software, except for, which is assumed to be the mail client.

To run the script, I have to provide it with the directory with the email files and a name for the generated HTML file:

awk -f spamchart.awk of=test.html ~/Library/Mail/Mailboxes/Cron\ Jobs/*.emlx

The script itself is very simple, with most of the typing spent on CSS for the “bars”. Please notice that the total message (per day) count is assumed to be on line 32 in the email. This should be fine for default setups, but must be changed accordingly in case your server adds additional headers (or doesn’t add the spam headers etc).

#! /usr/bin/awk

#Usage: awk -f spamchart.awk of=outfile.html maildir

/^Date: / {

    theDate = sprintf ("%s %d %d", $4, $3, $5);


/^total\ messages / {

    if (FNR == 32) {

        total += $3;

            printf ("<div style=\"background: silver; height: 15px; width: %dpx; font-size: x-small;\">%s %d</div>", $3, theDate, $3) > of;



END { printf ("<br />Total messages: %s", total) >> of; }

Here’s a sample of the output. Having a graphical view of our spam, I can immediately see that the numbers have been climbing steadily since August of this year. I guess I better get back to work then…