Bogofilter

Tested: 2+ weeks, test written and passed

Description^

Bogofilter is a statistical analysis mail filter. It has to be pre-trained with SPAM and HAM mails.

Configuration^

Please read before:

  1. default configuration
    • disable
    • max_size
  2. anti-spam module configuration
    • weight_innocent
    • weight_spam
    • weight_translate
  3. user configuration
    • default_user
    • user_cmd

default_user^

Default: -
Allowed values: string (path)
Required: no

Should contain path to a global bogofilter.cf file, if a shared SPAM database should be used. Typically, that would be /etc/bogofilter.cf

cmd_check^

Default: /usr/bin/bogofilter -c %user% -U -I %file% -v
Allowed values: string (path to bogofilter and cmd args)
Required: yes

The command line to bogofilter check command, including all command line arguments. All variables (%user% = user, %file% = path to temporary mail file) can be used.

cmd_learn_spam^

Default: /usr/bin/bogofilter -c %user% -s -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for learning SPAM with bogofilter.

cmd_unlearn_spam^

Default: /usr/bin/bogofilter -c %user% -N -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for UNlearning SPAM for mails which has been marked as spam beforehand.

cmd_learn_ham^

Default: /usr/bin/bogofilter -c %user% -n -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for learn new HAM.

cmd_unlearn_ham^

Default: /usr/bin/bogofilter -c %user% -n -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for UNlearn a mail which has been falsely recognized as SPAM.

Performance^

Runtime: average 0.5 secs

Bogofilter hints^

This is not about how to configure or run bogofilter in depth, just some issues that might come in handy. No warranties this is the best or even correct way to do it, though.

Global spam directory^

If you want to use one spam database rather then one per (unix) user, you can set the bogofilter_dir in bogofilter.cf (/etc):

bogofilter_dir = /var/spool/bogofilter

In the bogofilter configuration in decency you should then set the “default_user” to the global config file

default_user = /etc/bogofilter.cf

Initial train bogofilter^

As all statistical analysis filters bogofilter requires to be trained before it might come into action. If you have a large spam database (ham you probably have: your inbox), let’s say at least 10,000 mails, use those. If you don’t you can get an initial spam corpus from here or google for it or collect it via the HoneyPot/HoenyCollector modules.

Assuming you have your spam and ham files in two directories as eml files, you can train bogofilter like this:

cd spam-ham
find ham/ -type f -exec bogofilter --user-config /etc/bogofilter.cf -n -I {} \;
find spam/ -type f -exec bogofilter --user-config /etc/bogofilter.cf -s  -I {} \;

Or if you have mbox files:

cd spam-ham
bogofilter --user-config /etc/bogofilter.cf -n < ham.mbox
bogofilter --user-config /etc/bogofilter.cf -n < spam.mbox

More detailed informations can be found in the Bogofilter FAQ.

Leave a Reply

CAPTCHA image