SpamProbe - Running A Multi-User Server
by Brian Burton
In the summer of 2002 Paul Graham wrote an excellent article, A Plan For Spam, about filtering spam using Bayes rule. Many filters have been implemented along those lines. This paper outlines how one of them, SpamProbe, can be used to provide individualized Bayesian filtering for multiple users on a single mail filter. This technique requires no special software on the client side other than an IMAP compatible mail client such as Mozilla, Outlook Express, or Mulberry.
In order to implement this technique you'll need a mail server with the following software installed:
The author uses this technique to provide SpamProbe based filtering for multiple people on a mail server running RedHat Linux 7.x. RedHat comes with all of the above except SpamProbe as easily installed RPMs (or preinstalled in some cases).
This technique requires that all of your users access their email from an IMAP server. Selection and installation of an IMAP server is beyond the scope of this article. Some general tips might be helpful though:
Once you have an IMAP server running and your user's email accounts established the next step is to download and install SpamProbe on your mail server. You can obtain SpamProbe from it's download page on SourceForge. Follow the instructions in the README.txt file to compile and install the program.
Every user on the system will need to have four special mailboxes. They can have any names you like but I recommend: nonspam, spam, remove, and spamprobe. Incoming email will be stored into either the user's INBOX or spamprobe folder depending on spamprobe's classification. The other folders can be used by users to correct mistakes made by SpamProbe (more on this later). For purposes of this article we'll assume that the folders are placed in an subdirectory named IMAP under each user's home directory.
Each user on the system will need to have a directory named .spamprobe in their home directory. SpamProbe stores its database and assorted support files in this directory.
You will need either procmail or maildrop on the server to run SpamProbe on each incoming email. This article assumes the use of procmail. RedHat Linux ships with procmail and automatically invokes it on each incoming email so no separate installation would be required.
Each user will need to have a .procmailrc file that runs SpamProbe on each incoming message and adds an X-SpamProbe: header with the message's score and digest. Procmail then diverts spams to the spamprobe folder and passes hams to the user's INBOX.
The README.txt contains a sample .procmailrc file. The contrib directory of the SpamProbe distribution contains a sample maildrop config file.
Procmail will run SpamProbe to classify and sort incoming email. Unfortunately spam filters are not perfect and will make mistakes. Users will correct these mistakes by moving messages between folders in their email client.
In order for these database updates to be performed SpamProbe must be run periodically to scan the special mail folders. The README.txt folder provides a sample shell script to be run by cron periodically.
SpamProbe should also be run periodically to clean up its database. The SpamProbe contrib directory contains an excellent cleanup script that could be run nightly.
Hopefully this article will be helpful when setting up a multiuser mail server with SpamProbe based spam filtering. Over time we'll update this page to contain more useful information. You may also want to subscribe to the SpamProbe mailing list to keep up to date with the latest developments or offer suggestions for improving SpamProbe.