Optimizing DSPAM + MySQL 4.1
IntroductionDSPAM is a scalable and open-source content-based spam filter designed for multi-user enterprise systems. It's great at filtering out spam but on busy mailservers the pruning of the MySQL databases takes way too long time ... The default purge-4.1.sql script provided with DSPAM can be heavily optimized by adding indexes to the database and using the indexes properly when pruning. Lets start by add some indexes on the dspam_token_data table. Adding INDEX'es and using the indexes correctly allows us to query the dabases extremely fast. The default script and table structure provided with DSPAM causes full table scans since the data is either not indexed or the indexes are not properly used. Adding indexesTo begin with we need to add some indexes to the tables. Indexes allows us to query you databases much faster since we don't have to do full table scans. (Our current DSPAM database is 8.5G in size and full table scans literally brings the entire mailserver to a stand still.) Connect to the database server and issue the following commands:
This will add indexes to the spam_hits, innocent_hits and last_hit colums. The dspam_signature_data table is already properly indexed - however the indexes are not properly used when cleaning out old data (more about this below). The interesting parts of the script provided with DSPAM are as follows:
This query doesn't use the index on the last_hit column since we call the to_days function on the field and thereby loose the ability to use the index. Also notice that the extra added indexes on the innocent_hits and spam_hits are used here. Change the query to:
Next query:
Same problem - change this to:
Next query:
Change this to:
Next query:
Change this to:
Next query:
Change this to:
And finally:
should be changed to:
Testrun with the changed prune scriptSo does the changes help? Yes! Below are timings for the old unmodified script and the new modified script:
And for the new modified script (used on the same dataset):
The script used almost 3 minutes using the default DSPAM script and less than 2 seconds using the altered script and indexes. Pros and cons When adding indexes to tables you use far more disk space for your data. If you need the performance when pruning data and can afford to use the extra disk space then add the indexes and change your prune script as explained above. If you only have a small amount of data in your database and performance isn't an issue then stick with the default DSPAM script. LinksFeedbackAll feedback is appreciated - feel free to contact me via email: laursen[at]netgroup.dk
|






Recent comments
2 hours 37 min ago
3 hours 52 min ago
4 hours 34 min ago
5 hours 2 min ago
12 hours 29 min ago
15 hours 58 min ago
18 hours 49 min ago
23 hours 25 min ago
1 day 5 min ago
1 day 1 hour ago