[dspam-users] Re: My consideration to switch to DSPAM (postfix+maildrop)

From: Andreas Neuhaus <andy@stud.fh-dortmund.de>
Date: Mon Nov 14 2005 - 12:41:50 EST

> Regarding the spam detection from amavisd-new: As this is basically
> SpamAssassin, you could get probably a better filtering with DSPAM (at
> least I could).

I wish it would. As said, I turned off amavisd/SpamAssassin for my account,
using only dspam now. Accuracy is getting better (around 50% now), but still
much worse than with amavisd:

root TP: 0 TN: 0 FP: 0 FN: 0 SC: 2397 IC: 2750
andy TP: 85 TN: 0 FP: 0 FN: 78 SC: 0 IC: 0

All spam detection comes from the trained root dictionary (global group). My
personal dictionary seems to be still useless:

# for f in .Spam/cur/*; do echo $f; done |wc --lines
112
# for f in .Spam/cur/*; do dspam --user andy --classify <$f; done |grep -i
innocent |wc --lines
112
# for f in .Spam/cur/*; do dspam --user root --classify <$f; done |grep -i
innocent |wc --lines
29

Of 112 spam mails, the trained dictionary has 29 false negatives, my personal
dictionary has 100% false negatives.... What the hell am I doing wrong?

> I was surprised that you don't want amavisd anymore because it has
> more to offer than only spam scanning - detecting viruses for example.

Which dspam can also do via clamav now, so if the accuracy would at least be
as good as amavisd (which it should, I still think I'm doing something
wrong), there are not many reasons left to chose amavisd :)

> Did you look into the system.log (on my system in
> /usr/local/var/dspam)?

The system.log shows a lot of I entries that are definitely spam (false
negatives), but meanwhile also a few S entries (true positives). However all
spam detection seems to come from the global trained dictionary.

> Unfortunately, I can not give any further advice as I used a custom
> trainer script which called DSPAM directly without the corpus option
> (processing a mail and retraining it if it was misclassified,
> alternating spam/ham).

Once I got it running with a good accuracy, I plan to get something similar
(doing dspam retraining by moving spam into a special IMAP folder e.g.)

> > I'm still confused how groups work. I did read the README about 6 times
> > now, and I'm still confused.
> I find it confusing, too. The README could be improved explaining the
> options better.

I think I now basically understood how groups work, but I wonder if there's a
way to find out on which dictionary a mail classification was based (the
system.log doesn't state it).

And btw, I'm using the mysql storage backend, but dspam still writes .stats
and .log files for users (which can be turned off in the config file afaik) -
wouldn't it be a good idea to also store the log into the database? That way,
a web frontend could gather all data (statistics) from the database without
accessing local files.

regards,
Andreas Neuhaus

Received on Mon Nov 14 12:41:02 2005

This archive was generated by hypermail 2.1.8 : Tue Nov 15 2005 - 00:00:01 EST