Re: [dspam-users] 3.4.6: tum builds spam corpus very slowly and spam subject always on

From: Norman Maurer <nm@spam-box.de>
Date: Mon Aug 08 2005 - 06:39:19 EDT

You can use your old db .. no problems with the new version here.

but you say you had probs after training .. so you maybe must create a
new db to cleanup these problems :-(

On Mo, 2005-08-08 at 12:20 +0200, Andreas Klemm wrote:
> On Mon, Aug 08, 2005 at 11:34:35AM +0300, Ion-Mihai Tetcu wrote:
> >
> > [ Now why I didn't receive this on dspam-users also ? ]
> >
> > On Mon, 8 Aug 2005 08:27:35 +0200
> > Andreas Klemm <andreas@klemm.apsfilter.org> wrote:
> >
> > > Hi,
> > >
> > > its about this dspam version (dspam-3.4.6.20050523.0845).
> >
> > Can you update to 3.4.8 ? From what I remember there was a training bug
> > that got fixed.
>
> Can try that later, k thanks !
> I hope the database can stay ..
> Or do I need to setup database newly ?
>
> > > This time I tried to setup dspam, that it gets trained.
> > > Since by training corpus I somtimes get strange results,
> > > too weighted to the one or the other side (spam/innoc.).
> >
> > As a rule, I don't like corpus;
>
> haha, well, but even if you dont create one, one will eb
> created for you on the long run ;-)
>
> Or do you mean "*foreign* ready to use" corpuses ?
>
> > > But the corpus only is increasing very slowly.
> > >
> > > Seems for me as if training lasts a year or even more.
> > > Though I get aprox. 2000 mails a day. Mainly mailinglist
> > > traffic and ... spam.
> >
> > This is strange, AFAIR tum means teft until the magic border is
> > reached.
>
> Is tum ok for me ? Or should I use teft, see later below.
>
> >
> > > Look here: Only 157 and 11 for spam and innocent corpus.
> > >
> > > This few in about 10 weeks:
> > > Look, from when my db newinstall is:
> > > root@titan[ttyp2]{221} /var/db/pkg/postgresql-server-8.0.3 ll
> > > total 80
> > > -rw-r--r-- 1 root wheel 58 May 21 19:07 +COMMENT
> >
> > I've blown my db some months ago and from what I remember it took about
> > 2 or 3 weeks to get good filtering (but with teft). Same type of emails
> > as you.
>
> Should I use teft then ?
>
> > > root@titan[ttyp2]{211} ~ dspam_stats andreas
> > > andreas TS: 6204 TI: 23156 SM: 1045 IM: 77 SC: 157
> > > IC: 11
> > >
> > > Part of my config.
> > >
> > > TrainingMode tum
> > > #Feature sbph
> > > Feature chained
> > > Feature tb=4
> >
> > Try going with a smaller tb, eventually; it works for me, but I have
> > 1/30 spam/ham ratio and yours is 1/4 so you might get some more FP.
>
> False positives would be bad for me.
> I cannot review all of my spam daily.
>
> > > #Feature whitelist
> > > Feature noise
> > > Algorithm graham burton
> > > PValue graham
> > > Preference "signatureLocation=headers" # 'message' or 'headers'
> > > Preference "spamAction=tag"
> > >
> > > #Preference "spamSubject=SPAM"
> > > btw, although I commented out the spamsubject I still get it
> > > prepended to the subject as can be seen here.
> > >
> > > 1424 N Jan 30 ___ɯůS°Ï¡ãº^°®§ ( 49) [SPAM]
> > > »__ÃP__£___ɧAªº¹q__£®Ä¯à¡A¥Î³Ì«K©yªº»ù 1425 N Feb 01 ~Áú­·ºô­¶___j®v~
> > > ( 194) [SPAM] ³ÌHOTÁú¬yºô­¶§A___]¥i¥H__Ö__³¡F³Ì¶W­ÈÁ
> >
> > This works for me (no spam subj, and quarantine)
>
> Then I try the upgrade and lets see if this is fixed.
> Though this is not high prio for me.
>
> > #Preference "spamAction=quarantine"
> > Preference "signatureLocation=headers" # 'message' or 'headers'
> > Preference "showFactors=off"
> > Preference "spamAction=tag"
> > #Preference "spamSubject=SPAM"
> >
> > root@it/PU> /usr/ports/print/jadetex [11:17:21] 0
> > # dspam_stats -H itetcu
> > itetcu:
> > TS Total Spam: 4518
> > TI Total Innocent: 136103
> > SM Spam Misclassified: 1210
> > IM Innocent Misclassified: 1
> > SC Spam Corpusfed: 60
> > IC Innocent Corpusfed: 12
> > TL Training Left: 0
> > SR Spam Catch Rate: 78.88%
> > IR Innocent Catch Rate: 100.00%
> > OR Overall Rate/Accuracy: 99.15%
>
> root@titan[ttyp3]{201} ~ dspam_stats -H andreas
> andreas:
> TS Total Spam: 6269
> TI Total Innocent: 24034
> SM Spam Misclassified: 1045
> IM Innocent Misclassified: 77
> SC Spam Corpusfed: 157
> IC Innocent Corpusfed: 11
> TL Training Left: 0
> SR Spam Catch Rate: 85.71%
> IR Innocent Catch Rate: 99.68%
> OR Overall Rate/Accuracy: 96.43%
>
> U have better accuracy then me.
> Lets see what happens after update.
>
> Thanks for answering
>
> Andreas ///
>
Received on Mon Aug 8 06:39:54 2005

This archive was generated by hypermail 2.1.8 : Thu Sep 29 2005 - 13:51:28 EDT