Re: [dspam-users] spam factors

From: Russ Fink <russfink@hotmail.com>
Date: Mon Aug 01 2005 - 17:58:18 EDT

Another possible problem is if your training corpus has ^M characters in it,
that *might* cause it to think everything is a header. Do a dos2unix on it
first. This one bit me, I *think*. The symptom is absolutely zero keywords
from the message text, and 100% of keywords from the header.

> >
> > How does DSPAM decide which tokens to use when classifying an email?
>
>I assume that it takes a number of tokens with the highest scores, plus a
>number of tokens with the lowest scores, and mitigates them with the
>configured bayesian algorithm...
>
> > Most of the time when looking at the X-DSPAM-Factors header, it shows
> > mostly tokens taken from the email header.
>
>Maybe your database isn't yet populated enough ? See how my DSPAM factored
>your own email, you'll see both tokens from the headers and tokens from the
>body :
>
>X-DSPAM-Result: Innocent
>X-DSPAM-Confidence: 0.9965
>X-DSPAM-Probability: 0.0000
>X-DSPAM-Signature: 42ee86e274031263121426
>X-DSPAM-Factors: 27,
> use+when, 0.00010,
> the+DSPAM, 0.00010,
> Subject*factors, 0.00020,
> does+DSPAM, 0.00020,
> tokens+to, 0.00020,
> decide+which, 0.00020,
> From*Elliot, 0.00020,
> taken+from, 0.00020,
> Sender*owner, 0.00077,
> DSPAM, 0.00207,
> DSPAM, 0.00207,
> Subject*dspam, 0.00317,
> Return-Path*owner, 0.00466,
> *owner, 0.00467,
> To*users, 0.00600,
> Sender*nuclearelephant, 0.00600,
> *dspam+users, 0.00600,
> *owner+dspam, 0.00600,
> Sender*dspam+users, 0.00600,
> Sender*dspam, 0.00600,
> Return-Path*nuclearelephant+com, 0.00600,
> Received*dspam-users, 0.00600,
> Received*dspam-users, 0.00600,
> *nuclearelephant, 0.00600,
> To*users+lists, 0.00600,
> Sender*owner+dspam, 0.00600,
> Sender*lists+nuclearelephant, 0.00600
>
>--
>Michel Bouissou <michel@bouissou.net> OpenPGP ID 0xDDE8AC6E
>
Received on Mon Aug 1 17:59:08 2005

This archive was generated by hypermail 2.1.8 : Thu Sep 29 2005 - 13:51:28 EDT