----- Original Message -----
From: "Russ Fink" <russfink@hotmail.com>
> Another possible problem is if your training corpus has ^M characters in
it,
> that *might* cause it to think everything is a header. Do a dos2unix on
it
> first. This one bit me, I *think*. The symptom is absolutely zero
keywords
> from the message text, and 100% of keywords from the header.
well, I do get keywords from the message text. Just very few. This was
more a curiosity than anything.
I'm using Feature chained and Algorithm graham burton if that matters.
>
> > >
> > > How does DSPAM decide which tokens to use when classifying an email?
> >
> >I assume that it takes a number of tokens with the highest scores, plus a
> >number of tokens with the lowest scores, and mitigates them with the
> >configured bayesian algorithm...
> >
> > > Most of the time when looking at the X-DSPAM-Factors header, it shows
> > > mostly tokens taken from the email header.
> >
> >Maybe your database isn't yet populated enough ? See how my DSPAM
factored
> >your own email, you'll see both tokens from the headers and tokens from
the
> >body :
> >
> >X-DSPAM-Result: Innocent
> >X-DSPAM-Confidence: 0.9965
> >X-DSPAM-Probability: 0.0000
> >>X-DSPAM-Factors: 27,
> > use+when, 0.00010,
> > the+DSPAM, 0.00010,
> > Subject*factors, 0.00020,
> > does+DSPAM, 0.00020,
> > tokens+to, 0.00020,
> > decide+which, 0.00020,
> > From*Elliot, 0.00020,
> > taken+from, 0.00020,
> > Sender*owner, 0.00077,
> > DSPAM, 0.00207,
> > DSPAM, 0.00207,
> > Subject*dspam, 0.00317,
> > Return-Path*owner, 0.00466,
> > *owner, 0.00467,
> > To*users, 0.00600,
> > Sender*nuclearelephant, 0.00600,
> > *dspam+users, 0.00600,
> > *owner+dspam, 0.00600,
> > Sender*dspam+users, 0.00600,
> > Sender*dspam, 0.00600,
> > Return-Path*nuclearelephant+com, 0.00600,
> > Received*dspam-users, 0.00600,
> > Received*dspam-users, 0.00600,
> > *nuclearelephant, 0.00600,
> > To*users+lists, 0.00600,
> > Sender*owner+dspam, 0.00600,
> > Sender*lists+nuclearelephant, 0.00600
> >
> >--
> >Michel Bouissou <michel@bouissou.net> OpenPGP ID 0xDDE8AC6E
> >
>
>
>
>
>
> ***********************************************
> EMERY TELCOM SPAM FILTERING CHECK: Innocent
> ***********************************************
> If you consider this e-mail to be spam, please click on the URL below.
> Emery Telcom's free spam detector service will be adjusted accordingly for
future e-mails.
>
http://postmaster.etv.net:8080/reclassify?user=6566696e6c6579776f726b406566696e6c65792e636f6d&signature=42ee9b02632851063679225&result=496e6e6f63656e74
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.338 / Virus Database: 267.9.7/60 - Release Date: 7/28/2005
>
Received on Mon Aug 1 18:22:52 2005
This archive was generated by hypermail 2.1.8 : Thu Sep 29 2005 - 13:51:28 EDT