[dspam-users] Re: My consideration to switch to DSPAM (postfix+maildrop)

From: Andreas Neuhaus <andy@stud.fh-dortmund.de>
Date: Sun Nov 13 2005 - 20:42:05 EST

Hi Felix,

> > lot of HOWTOs, FAQs and documentation about DSPAM over the last weeks.
> > From what I read, DSPAM sounds great and I consider to get rid of amavisd
> > now.
> Btw: Why do you want to get rid of amavisd? amavisd or amavisd-new?

Curently we're running amavisd-new, which does a pretty good job on detecting
spam. Yesterday I disabled amavis processing for my personal account and was
surprised how fast my mailbox filled up with junk. I'm now trying to use
dspam (just for my personal account atm) because as far as I heard/read,
dspam scales better (I never liked how much memory and cpu amavisd is using
with all that perl stuff).

Unfortunately I probably did something wrong, because I don't get a useful
accuracy rate with dspam. Current accuracy is at 5.68%, which is not really
useful. I started with basic settings (TrainingMode teft,
TestConditionalTraining on, Feature chained,tb=5,whitelist, Algorithm graham
burton, PValue graham, ProcessorBias on). Starting with empty tables, I
showed dspam 1250 hams and 1250 spams from my IMAP folders by using
dspam_corpus (which calls dspam with --class=innocent|spam --source=corpus
--user andy --mode=teft --feature=chained,noise). The resulting dspam_stats:
andy TP: 0 TN: 0 FP: 0 FN: 0 SC: 1250 IC: 1250
Then I waited a while for more incoming mails, expecting dspam to already
detect a decent amount of spam. But unfortunately it failed to detect
anything. Every new mail was marked innocent (I'm not using the dspam
quarantine, I'm calling dspam from within maildrop's .mailfiler for my
account: xfilter "dspam --user andy --stdout --deliver=innocent,spam)".
I thought that most of the false negatives will go away once I trained dspam a
bit, so I gave dspam --user andy --source=error --class=spam all the
undetected spam, But after 83 Mails, dspam still doesn't detect my spam.
dspam_stats now show:
andy TP: 0 TN: 5 FP: 0 FN: 83 SC: 1250 IC: 1250
If I run dspam --classify on the above 83 spams, all of them are detected as
innocent, even though I explicitly told dspam that they're spam.

Additionally I used the dspam_sa_trainer to train another user with the
spamassassin publiccorpus archives (SC: 2397 IC: 2750). This seemed to work
well. Using dspam --classify on the above 83 fresh spams detected about 90%
of them.

Could this be? What did I do wrong?

> > best way would be to use the neural network feature, which is
> > unfortunately still experimental.
> Did you consider using a global merged group?
> In that case you should think about training a the base use as you
> probably don't want that any user can change the spam tokens for all
> other. Maybe you have to select manually which forwarded mails will be
> used for training the base user.

I'm still confused how groups work. I did read the README about 6 times now,
and I'm still confused. From what I can tell (please correct me, if I'm
wrong):

- Shared group means, all users in that group share the same dictionary.
Probably by assigning every user the same uid in dspam_virtual_uids?
- Inoculation means that every user of a group has his own dictionary. On
training, a user does submit not only to his own dictionary but also to every
other user in his group. Wouldn't that mean that all users of that group have
their dictionaries syncronized (if they all start with an empty or the same
data)?
- Classification means, that a user will use dictionaries of all other group
members if his own dictionary fails to classify a message reliably.
- A global group will make all users use the globaluser's dictionary, if their
own dictionary fails to classify a message reliably.
- A merged group is like a global group, but does (always?) merge the user's
personal dictionary with the globaluser's dictionary in real-time. This makes
the globaluser's dictionary behave as an offset to the user's personal
dictionary (and not as a fallback dict like in a global group).

If I understand it correctly, in my case, where we have lots of users who
won't ever retrain their false negatives, I would use a global group so that
most users can benefit from the global dictionary and automatically switch to
their own dictionary, once it is reliable enough.

Question: Can I have a global group mixed with an inoculation group for the
global user? The idea would be, that a few, chosen people are able to retrain
the global dictionary, so that it stays up to date with newer spam. Does
something like
  retrainers:inoculation:globaluser,aaa,bbb,ccc // retraining people
  everybody:classification:*globaluser // global group
work? Would that make the retrains of aaa, bbb and ccc inoculate the
globaluser and therefore automatically affect all accounts that don't have
too small dictionaries?

Additionally I was thinking about inoculating the globaluser with spam from
some honeypot accounts and maybe with ham sent by the users. Would that be a
more or less working environment which doesn't need too much maintenance?

regards,
Andreas Neuhaus

Received on Sun Nov 13 20:41:18 2005

This archive was generated by hypermail 2.1.8 : Tue Nov 15 2005 - 00:00:01 EST