Bogofilter
Tested: 2+ weeks, test written and passed
Description^
Bogofilter is a statistical analysis mail filter. It has to be pre-trained with SPAM and HAM mails.
Configuration^
Please read before:
- default configuration
- disable
- max_size
- anti-spam module configuration
- weight_innocent
- weight_spam
- weight_translate
- user configuration
- default_user
- user_cmd
default_user^
Default: -
Allowed values: string (path)
Required: no
Should contain path to a global bogofilter.cf file, if a shared SPAM database should be used. Typically, that would be /etc/bogofilter.cf
cmd_check^
Default: /usr/bin/bogofilter -c %user% -U -I %file% -v
Allowed values: string (path to bogofilter and cmd args)
Required: yes
The command line to bogofilter check command, including all command line arguments. All variables (%user% = user, %file% = path to temporary mail file) can be used.
cmd_learn_spam^
Default: /usr/bin/bogofilter -c %user% -s -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes
Command line for learning SPAM with bogofilter.
cmd_unlearn_spam^
Default: /usr/bin/bogofilter -c %user% -N -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes
Command line for UNlearning SPAM for mails which has been marked as spam beforehand.
cmd_learn_ham^
Default: /usr/bin/bogofilter -c %user% -n -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes
Command line for learn new HAM.
cmd_unlearn_ham^
Default: /usr/bin/bogofilter -c %user% -n -I %file%
Allowed values: string (path to bogofilter and cmd args)
Required: yes
Command line for UNlearn a mail which has been falsely recognized as SPAM.
Performance^
Runtime: average 0.5 secs
Bogofilter hints^
This is not about how to configure or run bogofilter in depth, just some issues that might come in handy. No warranties this is the best or even correct way to do it, though.
Global spam directory^
If you want to use one spam database rather then one per (unix) user, you can set the bogofilter_dir in bogofilter.cf (/etc):
bogofilter_dir = /var/spool/bogofilter
In the bogofilter configuration in decency you should then set the “default_user” to the global config file
default_user = /etc/bogofilter.cf
Initial train bogofilter^
As all statistical analysis filters bogofilter requires to be trained before it might come into action. If you have a large spam database (ham you probably have: your inbox), let’s say at least 10,000 mails, use those. If you don’t you can get an initial spam corpus from here or google for it or collect it via the HoneyPot/HoenyCollector modules.
Assuming you have your spam and ham files in two directories as eml files, you can train bogofilter like this:
cd spam-ham
find ham/ -type f -exec bogofilter --user-config /etc/bogofilter.cf -n -I {} \;
find spam/ -type f -exec bogofilter --user-config /etc/bogofilter.cf -s -I {} \;
Or if you have mbox files:
cd spam-ham bogofilter --user-config /etc/bogofilter.cf -n < ham.mbox bogofilter --user-config /etc/bogofilter.cf -n < spam.mbox
More detailed informations can be found in the Bogofilter FAQ.
My Name is Ulrich Kautz and this is my private blog about server administration, perl programming and some other stuff that is on my mind. I study part-time computer sience at FU Berlin and work as sys admin and web developer at our hosting company