Hi list, hi Jonathan,
I think I'm getting closer to my dspam 3.6 learning problems. Because I'm
still having problems with dspam 3.6 not learning (it doesn't detect a single
spam), I wrote some scripts that compiles and installs various dspam versions
into various subdirs (running as a unprivileged user, not root) and then
trains it (like train.pl) and outputs dspam_stats at the end.
All tests were run on the same system (Gentoo Linux on my notebook) and the
data dirs (/var/dspam normally) were wiped before each test. Configure
parameters are also the same always:
--prefix=/private/dir/for/every/test
--with-dspam-home=/private/ramdisk/dir/for/every/test
--with-dspam-mode=755
--with-dspam-owner=andy
--with-dspam-group=users
--with-storage-driver, --with-db-includes and --with-db-libs differ from test
to test
dspam.conf original settings were used, except that I turned off notifications
and added the user I'm running the tests as to the list of trusted users.
Training tests were done by first wiping the dspam home (ramdisk for
performance), so that it starts with absolutely no data. Then the script
trains 500 spam and ham mails from the spamassassin archive and retrains
false positives/negatives.
Results:
dspam-3.4.9 with libdb4_drv using db-4.2.52_p2: works
dspam-3.6.1 with libdb4_drv using db-4.1.25_p1: doesn't work!
dspam-3.6.2 with libdb4_drv using db-4.1.25_p1: doesn't work!
dspam-3.6.0 with libdb4_drv using db-4.2.52_p2: doesn't work!
dspam-3.6.1 with libdb4_drv using db-4.2.52_p2: doesn't work!
dspam-3.6.2 with libdb4_drv using db-4.2.52_p2: doesn't work!
dspam-3.6.1 with hash_drv: works
dspam-3.6.2 with hash_drv: works
"works" means that after a short learning phase, dspam was able to greatly
detect every spam. This is how I think it's supposed to be:
Training dspam: . = true positive/negative, * = false negative, # = false
positive:
*.*.*.*.*.*.*.*.*.*.....*.....*......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................#..............................................................................................................................................................................................................................................................................................................................................................................................................
===============================================
andy TP: 486 TN: 497 FP: 1 FN: 12 SC: 0 IC: 0
SR: 97.59% IR: 99.80% OR: 98.69%
But if using a 3.6 version with db4, a training with the same data looks like
this ("doesn't work!"):
Training dspam: . = true positive/negative, * = false negative, # = false
positive:
*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.
===============================================
andy TP: 0 TN: 498 FP: 0 FN: 498 SC: 0 IC: 0
SR: 0.00% IR: 100.00% OR: 50.00%
As you can see, dspam does not learn like when using the hash_drv. I suppose
there's something wrong with libdb4_drv since 3.6. I doubt that db4 is broken
on my system, since other applications run fine with db4 (though I didn't do
a regression test on db4 yet)
What do you think?
I'll try to do the same tests with mysql_drv later this day.
regards,
Andreas Neuhaus
This archive was generated by hypermail 2.1.8 : Thu Dec 01 2005 - 00:00:01 EST