[dspam-users] Tuning dspam maximum message size (was: Re: Dspam eats CPU)

From: Casey Allen Shobe <lists@seattleserver.com>
Date: Sun Feb 26 2006 - 04:23:59 EST

On Saturday 25 February 2006 18:46, Mike Horwath wrote:
> Personally, I think people should review their needs in terms of what
> size they should use.
>
> In my collection of spam, the largest I can find is 87KB.
>
> You *might* save some CPU (quite a bit :) by not scanning messages
> over a specific size, and each 'users' size might be different.
>
> 'user' in this case means the system as a whole.
>
> For me - I set it to 256KB.

That's a really good point. I've now done the same. Not only should it save
CPU, but some database bloat as well (which will likely in turn save more CPU
and disk I/O). Don't know why it didn't occur to me earlier that spammers
don't generally send large attachments due to cost or to look at my spam
collection to see what the largest was.

This should cut down on some false positives too. On that thought, I just
went and had some fun with find and grep with all the filtered spam for the
last 7 days that users haven't retrained or deleted (some are lazy) and
discovered:

1. *every* filtered email over 100KB was a false positive. 5 total.
2. There were only 2 filtered spam messages of a "largish" size - one was
63KB and one was 76KB, each a real spam with an attached image.
3. The 27KB-63KB range was a 50/50 mix of spam and false positives.
4. The 10KB-27KB range was 100% real spam.

Filtered spam counts by size range:

less than 1KB - 2276
1KB-10KB - 11726
10KB-50KB - 57
50KB-100KB - 2
greater than 100KB - 5 (all false positives)

So I'm setting our threshold to 100KB. I have to wonder if it wouldn't be
advantageous for dspam to set a lower spam probabability for E-mail over 10KB
and especially over 100KB based on these numbers, and would be curious as to
what others with larger datasets might discover when doing similar testing.

Thanks a bundle!

-- 
Casey Allen Shobe | cshobe@seattleserver.com | 206-381-2800
SeattleServer.com, Inc. | http://www.seattleserver.com
Received on Sun Feb 26 04:29:10 2006

This archive was generated by hypermail 2.1.8 : Mon Feb 27 2006 - 00:00:02 EST