Re: [dspam-users] 3.4.6: tum builds spam corpus very slowly and spam subject always on

From: Andreas Klemm <andreas@klemm.apsfilter.org>
Date: Mon Aug 08 2005 - 06:20:05 EDT

On Mon, Aug 08, 2005 at 11:34:35AM +0300, Ion-Mihai Tetcu wrote:
>
> [ Now why I didn't receive this on dspam-users also ? ]
>
> On Mon, 8 Aug 2005 08:27:35 +0200
> Andreas Klemm <andreas@klemm.apsfilter.org> wrote:
>
> > Hi,
> >
> > its about this dspam version (dspam-3.4.6.20050523.0845).
>
> Can you update to 3.4.8 ? From what I remember there was a training bug
> that got fixed.

Can try that later, k thanks !
I hope the database can stay ..
Or do I need to setup database newly ?

> > This time I tried to setup dspam, that it gets trained.
> > Since by training corpus I somtimes get strange results,
> > too weighted to the one or the other side (spam/innoc.).
>
> As a rule, I don't like corpus;

haha, well, but even if you dont create one, one will eb
created for you on the long run ;-)

Or do you mean "*foreign* ready to use" corpuses ?

> > But the corpus only is increasing very slowly.
> >
> > Seems for me as if training lasts a year or even more.
> > Though I get aprox. 2000 mails a day. Mainly mailinglist
> > traffic and ... spam.
>
> This is strange, AFAIR tum means teft until the magic border is
> reached.

Is tum ok for me ? Or should I use teft, see later below.

>
> > Look here: Only 157 and 11 for spam and innocent corpus.
> >
> > This few in about 10 weeks:
> > Look, from when my db newinstall is:
> > root@titan[ttyp2]{221} /var/db/pkg/postgresql-server-8.0.3 ll
> > total 80
> > -rw-r--r-- 1 root wheel 58 May 21 19:07 +COMMENT
>
> I've blown my db some months ago and from what I remember it took about
> 2 or 3 weeks to get good filtering (but with teft). Same type of emails
> as you.

Should I use teft then ?

> > root@titan[ttyp2]{211} ~ dspam_stats andreas
> > andreas TS: 6204 TI: 23156 SM: 1045 IM: 77 SC: 157
> > IC: 11
> >
> > Part of my config.
> >
> > TrainingMode tum
> > #Feature sbph
> > Feature chained
> > Feature tb=4
>
> Try going with a smaller tb, eventually; it works for me, but I have
> 1/30 spam/ham ratio and yours is 1/4 so you might get some more FP.

False positives would be bad for me.
I cannot review all of my spam daily.

> > #Feature whitelist
> > Feature noise
> > Algorithm graham burton
> > PValue graham
> > Preference "signatureLocation=headers" # 'message' or 'headers'
> > Preference "spamAction=tag"
> >
> > #Preference "spamSubject=SPAM"
> > btw, although I commented out the spamsubject I still get it
> > prepended to the subject as can be seen here.
> >
> > 1424 N Jan 30 ___ɯůS°Ï¡ãº^°®§ ( 49) [SPAM]
> > »__ÃP__£___ɧAªº¹q__£®Ä¯à¡A¥Î³Ì«K©yªº»ù 1425 N Feb 01 ~Áú­·ºô­¶___j®v~
> > ( 194) [SPAM] ³ÌHOTÁú¬yºô­¶§A___]¥i¥H__Ö__³¡F³Ì¶W­ÈÁ
>
> This works for me (no spam subj, and quarantine)

Then I try the upgrade and lets see if this is fixed.
Though this is not high prio for me.

> #Preference "spamAction=quarantine"
> Preference "signatureLocation=headers" # 'message' or 'headers'
> Preference "showFactors=off"
> Preference "spamAction=tag"
> #Preference "spamSubject=SPAM"
>
> root@it/PU> /usr/ports/print/jadetex [11:17:21] 0
> # dspam_stats -H itetcu
> itetcu:
> TS Total Spam: 4518
> TI Total Innocent: 136103
> SM Spam Misclassified: 1210
> IM Innocent Misclassified: 1
> SC Spam Corpusfed: 60
> IC Innocent Corpusfed: 12
> TL Training Left: 0
> SR Spam Catch Rate: 78.88%
> IR Innocent Catch Rate: 100.00%
> OR Overall Rate/Accuracy: 99.15%

root@titan[ttyp3]{201} ~ dspam_stats -H andreas
andreas:
                TS Total Spam: 6269
                TI Total Innocent: 24034
                SM Spam Misclassified: 1045
                IM Innocent Misclassified: 77
                SC Spam Corpusfed: 157
                IC Innocent Corpusfed: 11
                TL Training Left: 0
                SR Spam Catch Rate: 85.71%
                IR Innocent Catch Rate: 99.68%
                OR Overall Rate/Accuracy: 96.43%

U have better accuracy then me.
Lets see what happens after update.

Thanks for answering

        Andreas ///

-- 
http://www.64bits.de
http://www.apsfilter.org
http://people.FreeBSD.org/~andreas
Received on Mon Aug 8 06:25:58 2005

This archive was generated by hypermail 2.1.8 : Thu Sep 29 2005 - 13:51:28 EDT