Re: [dspam-users] poor accuracy

From: Jonathan Zdziarski <jonathan@nuclearelephant.com>
Date: Thu Jan 26 2006 - 15:41:36 EST

Build a dir of ham and a dir of spam, use the SA corpus to start and
add in any other corpus messages from you or your users you'd like to
add into it. Try a few initial trains with different modes and see
which one gives you the best accuracy (it is TOE for just the
straight SA corpus) - then train the production database in that
mode. You might also consider running dspam_train repeatedly under
TOE mode until there are few or no errors, which can also boost your
accuracy depending on the diversity of your corpus.

If you're paranoid about FPs, use toe/burton with the SA corpus,
otherwise toe/graham-burton will catch more spam but have a higher fp
rate.

Jonathan

On Jan 26, 2006, at 3:37 PM, Bob Hrbek wrote:

>
> ----- Original Message ----- From: "Jonathan Zdziarski"
> <jonathan@nuclearelephant.com>
>
>
>> Try rebuilding using dspam_train in cvs.
>>
>> Jonathan
>
> When you say "rebuilding" you mean create a new merged user with
> the SA corpus or should I just keep using the merged user I already
> have setup? I don't have loads of real spam mail boxes. I just
> have the database tokens as the mail passed through the server.
>
> dspam_train [username] [spam_dir] [nonspam_dir]
>
>
>>
>>
>> On Jan 26, 2006, at 3:09 PM, Bob Hrbek wrote:
>>
>>> Messages Today This Hour
>>> Spam 1419 1
>>> Good 811 8
>>> Spam Misses 2108 9
>>> False Positives 122 0
>>> Inoculations 0 0
>>> Total 4460 18
>>>
>>>
>>>
>>> Name Q.Size TP TN FP FN SC IC
>>>
>>> merge2 -- 138263 53872 0 0 13789 10602
>>>
>>>
>>> I have this merge2 user above that is a compliation of about 30
>>> users now + the SA spam/ham corpus + about 6 months of spam/ham
>>> e- mail collection and training by 15 people.
>>>
>>> I'm going to lose my mind if I click on the retrain link
>>> anymore. Is there something I can do with the database or
>>> ANYTHING to make dspam more accurate?
>>>
>>>
>>> thanks,
>>> Bob
>>>
>>>
>>>
>>>
>>
>>
>
>
>
Received on Thu Jan 26 15:43:48 2006

This archive was generated by hypermail 2.1.8 : Fri Jan 27 2006 - 00:00:02 EST