SpamProbe

SpamProbe - Running A Multi-User Server
by Brian Burton
http://spamprobe.sourceforge.net

Introduction

In the summer of 2002 Paul Graham wrote an excellent article, A Plan For Spam, about filtering spam using Bayes rule. Many filters have been implemented along those lines. This paper outlines how one of them, SpamProbe, can be used to provide individualized Bayesian filtering for multiple users on a single mail filter. This technique requires no special software on the client side other than an IMAP compatible mail client such as Mozilla, Outlook Express, or Mulberry.

Required Software

In order to implement this technique you'll need a mail server with the following software installed:

SpamProbe to classify incoming email
procmail to run SpamProbe on each incoming email and file email into appropriate mailboxes.
An IMAP mail server to allow users to read their email. The IMAP server must store mailboxes in mbox or maildir format so that SpamProbe can process them.
Cron to run scripts to periodically update the SpamProbe database.

The author uses this technique to provide SpamProbe based filtering for multiple people on a mail server running RedHat Linux 7.x. RedHat comes with all of the above except SpamProbe as easily installed RPMs (or preinstalled in some cases).

IMAP

This technique requires that all of your users access their email from an IMAP server. Selection and installation of an IMAP server is beyond the scope of this article. Some general tips might be helpful though:

Use IMAP over SSL if at all possible. The imapd shipped with RedHat Linux supports this. You will need to obtain or generate a certificate to use SSL. A self signed certificate may cause annoying warnings from some mail clients.
If possible use different passwords for users of the IMAP server than those used by their login accounts. This is important because many users will tell their email clients to remember the passwords and those could be obtained by a cracker that compromises their PCs. Using CRAM-MD5 authentication with imapd makes this fairly painless. If you do this you can also disable shell logins by mail users as an added security precaution.

Once you have an IMAP server running and your user's email accounts established the next step is to download and install SpamProbe on your mail server. You can obtain SpamProbe from it's download page on SourceForge. Follow the instructions in the README.txt file to compile and install the program.

Every user on the system will need to have four special mailboxes. They can have any names you like but I recommend: nonspam, spam, remove, and spamprobe. Incoming email will be stored into either the user's INBOX or spamprobe folder depending on spamprobe's classification. The other folders can be used by users to correct mistakes made by SpamProbe (more on this later). For purposes of this article we'll assume that the folders are placed in an subdirectory named IMAP under each user's home directory.

Each user on the system will need to have a directory named .spamprobe in their home directory. SpamProbe stores its database and assorted support files in this directory.

Procmail

You will need either procmail or maildrop on the server to run SpamProbe on each incoming email. This article assumes the use of procmail. RedHat Linux ships with procmail and automatically invokes it on each incoming email so no separate installation would be required.

Each user will need to have a .procmailrc file that runs SpamProbe on each incoming message and adds an X-SpamProbe: header with the message's score and digest. Procmail then diverts spams to the spamprobe folder and passes hams to the user's INBOX.

The README.txt contains a sample .procmailrc file. The contrib directory of the SpamProbe distribution contains a sample maildrop config file.

Cron

Procmail will run SpamProbe to classify and sort incoming email. Unfortunately spam filters are not perfect and will make mistakes. Users will correct these mistakes by moving messages between folders in their email client.

When a ham is mistakenly placed in the spamprobe folder the user can drag the email into the nonspam folder. SpamProbe will then reclassify the message and update its database accordingly.
When a spam is mistakenly delivered to the user's INBOX the user can drag the email into the spam folder. SpamProbe will then reclassify the message and update its database accordingly.
Sometimes a user may decide that an email should not be classified at all. This might happen when a friend forwards a sample spam to the user. Classifying such a message as either spam or ham would confuse the scores of words in the message. For example, the friend's email address would look spammy if the message is classified as spam but words in the sample spam would appear hammy if the message is classified as ham. When a message should be ignored by SpamProbe the user can drag the email into the remove folder. SpamProbe will then remove the message from its database.

In order for these database updates to be performed SpamProbe must be run periodically to scan the special mail folders. The README.txt folder provides a sample shell script to be run by cron periodically.

SpamProbe should also be run periodically to clean up its database. The SpamProbe contrib directory contains an excellent cleanup script that could be run nightly.

Conclusion

Hopefully this article will be helpful when setting up a multiuser mail server with SpamProbe based spam filtering. Over time we'll update this page to contain more useful information. You may also want to subscribe to the SpamProbe mailing list to keep up to date with the latest developments or offer suggestions for improving SpamProbe.