Re: [dspam-users] 3.4.6: tum builds spam corpus very slowly and spam subject always on

From: Ion-Mihai Tetcu <itetcu@people.tecnik93.com>
Date: Mon Aug 08 2005 - 04:34:35 EDT

 [ Now why I didn't receive this on dspam-users also ? ]

On Mon, 8 Aug 2005 08:27:35 +0200
Andreas Klemm <andreas@klemm.apsfilter.org> wrote:

> Hi,
>
> its about this dspam version (dspam-3.4.6.20050523.0845).

Can you update to 3.4.8 ? From what I remember there was a training bug
that got fixed.

> This time I tried to setup dspam, that it gets trained.
> Since by training corpus I somtimes get strange results,
> too weighted to the one or the other side (spam/innoc.).

As a rule, I don't like corpus;

> But the corpus only is increasing very slowly.
>
> Seems for me as if training lasts a year or even more.
> Though I get aprox. 2000 mails a day. Mainly mailinglist
> traffic and ... spam.

This is strange, AFAIR tum means teft until the magic border is
reached.

> Look here: Only 157 and 11 for spam and innocent corpus.
>
> This few in about 10 weeks:
> Look, from when my db newinstall is:
> root@titan[ttyp2]{221} /var/db/pkg/postgresql-server-8.0.3 ll
> total 80
> -rw-r--r-- 1 root wheel 58 May 21 19:07 +COMMENT

I've blown my db some months ago and from what I remember it took about
2 or 3 weeks to get good filtering (but with teft). Same type of emails
as you.

> root@titan[ttyp2]{211} ~ dspam_stats andreas
> andreas TS: 6204 TI: 23156 SM: 1045 IM: 77 SC: 157
> IC: 11
>
> Part of my config.
>
> TrainingMode tum
> #Feature sbph
> Feature chained
> Feature tb=4

Try going with a smaller tb, eventually; it works for me, but I have
1/30 spam/ham ratio and yours is 1/4 so you might get some more FP.

> #Feature whitelist
> Feature noise
> Algorithm graham burton
> PValue graham
> Preference "signatureLocation=headers" # 'message' or 'headers'
> Preference "spamAction=tag"
>
> #Preference "spamSubject=SPAM"
> btw, although I commented out the spamsubject I still get it
> prepended to the subject as can be seen here.
>
> 1424 N Jan 30 ___ɯůS°Ï¡ãº^°®§ ( 49) [SPAM]
> »__ÃP__£___ɧAªº¹q__£®Ä¯à¡A¥Î³Ì«K©yªº»ù 1425 N Feb 01 ~Áú­·ºô­¶___j®v~
> ( 194) [SPAM] ³ÌHOTÁú¬yºô­¶§A___]¥i¥H__Ö__³¡F³Ì¶W­ÈÁ

This works for me (no spam subj, and quarantine)

#Preference "spamAction=quarantine"
Preference "signatureLocation=headers" # 'message' or 'headers'
Preference "showFactors=off"
Preference "spamAction=tag"
#Preference "spamSubject=SPAM"

root@it/PU> /usr/ports/print/jadetex [11:17:21] 0
 # dspam_stats -H itetcu
itetcu:
                TS Total Spam: 4518
                TI Total Innocent: 136103
                SM Spam Misclassified: 1210
                IM Innocent Misclassified: 1
                SC Spam Corpusfed: 60
                IC Innocent Corpusfed: 12
                TL Training Left: 0
                SR Spam Catch Rate: 78.88%
                IR Innocent Catch Rate: 100.00%
                OR Overall Rate/Accuracy: 99.15%

-- 
IOnut
Unregistered ;) FreeBSD "user"
  "Intellectual Property" is   nowhere near as valuable   as "Intellect"
Received on Mon Aug 8 04:54:27 2005

This archive was generated by hypermail 2.1.8 : Thu Sep 29 2005 - 13:51:28 EDT