Re: [dspam-users] Different results on different platforms

From: Peter Larkowski <peter@larkowski.net>
Date: Thu Nov 15 2007 - 04:06:26 CET

On Nov 14, 2007, at 5:28 PM, Ion-Mihai Tetcu wrote:

>>
>> Since dspam uses a lot of floats, some minor calculation bug could
>> show up, and translate to the *small* gap your stats shows.

I think 87 missed messages compared to 25 is more than a small gap,
especially since the ultrasparc appears to not be learning (ie it
still missing messages consistantly at the end of the training)
whereas the athlon misses almost all 25 of it's messages in the 1st
half of the training. I don't have stats to back this up, just what
I observed.
>
>
> Hmmm.
>
> So you are sure you use exactly the same corpus on both platforms ?
> And also that you pipe the corpus in the same order on both
> platforms ?

Yes, I checked the logs

> You use the same mysql version on both platforms, right ?

Yes, mysql50 from ports

>
> Since I don't have access to a sparc, could you please sent me your
> sparc dspam build log, uname -a for both systems, /etc/make.conf for
> both systems, gcc version, etc.

sparc64: $ uname -a
FreeBSD sparc64.larkowski.net 6.3-PRERELEASE FreeBSD 6.3-PRERELEASE
#0: Thu Oct 25 22:11:38 EDT 2007 root@pc.larkowski.net:/usr/obj/
sparc64/usr/src/sys/SPARC64 sparc64

amd64: # uname -a
FreeBSD pc.larkowski.net 6.3-PRERELEASE FreeBSD 6.3-PRERELEASE #0:
Fri Oct 26 22:20:30 EDT 2007 root@pc.larkowski.net:/usr/obj/amd64/
usr/src/sys/PC amd64

I'd have to reboot to give you the i386 uname -a, but's the same box
as amd64 and it was built from the same src tree, so it's going to be
about the same.

>
> If you have access to an amd64 it would be interesting to see if diffs
> exists on it also (in which case I'd "blame" something in dspam code).
>
>
> I don't have time to make a corpus and run and i386 vs amd64 test
> myself (eventually if you send me you corpus I'd give it a try).
>

I ran this under amd64 (again same version of freebsd, ports, dspam,
mysql, etc). I get different stats again (freebsd/amd64 seems to be
in the middle of the other 2):

FreeBSD/amd64
                 TP True Positives: 950
                 TN True Negatives: 999
                 FP False Positives: 1
                 FN False Negatives: 50
                 SC Spam Corpusfed: 0
                 NC Nonspam Corpusfed: 0
                 TL Training Left: 1500
                 SHR Spam Hit Rate 95.00%
                 HSR Ham Strike Rate: 0.10%
                 OCA Overall Accuracy: 97.45%

FreeBSD/i386
                 TP True Positives: 977
                 TN True Negatives: 998
                 FP False Positives: 2
                 FN False Negatives: 23
                 SC Spam Corpusfed: 0
                 NC Nonspam Corpusfed: 0
                 TL Training Left: 1500
                 SHR Spam Hit Rate 97.70%
                 HSR Ham Strike Rate: 0.20%
                 OCA Overall Accuracy: 98.75%

FreeBSD/sparc64
                 TP True Positives: 913
                 TN True Negatives: 1000
                 FP False Positives: 0
                 FN False Negatives: 87
                 SC Spam Corpusfed: 1
                 NC Nonspam Corpusfed: 0
                 TL Training Left: 1500
                 SHR Spam Hit Rate 91.30%
                 HSR Ham Strike Rate: 0.00%
                 OCA Overall Accuracy: 95.65%

I think I'll try the hash backend and make sure this isn't mysql
related. Otherwise, I can send buildlogs if you'd like. I'm not
really sure how big a deal this really is, but it's interesting at
least.

-Peter
Received on Thu Nov 15 04:06:41 2007

This archive was generated by hypermail 2.1.8 : Sat Nov 17 2007 - 00:00:11 CET