[lbo-talk] corporate censorship strikes again!

Matt lbo2 at beyondzero.net
Wed Oct 8 11:25:27 PDT 2003


On Wed, Oct 08, 2003 at 02:00:08PM -0400, Wojtek Sokolowski wrote:


> It is a tough situation. I _voluntarily_ set my MS Outlook filters to
> trash all incoming messages based on their content, such as ads offering
> V.i.a.g.r.a, p.e.n.i.s. e.n.l.a.r.g.e.m.e.n.t and similar services (the
> above spelling will most likely fool content based filters) as well as
> offers from N.i.g.e.r.i.a.n. banks. It works most of the time (unless
> spammers come up with some creative spelling) but it also filters out
> some lbo-talk messages, especially those with body parts references.
>
> I think that is the price we pay for spam.

The best effort in spam fighting right now is what is called Bayes filtering. It is the next generation of the keyword searching you describe above. Bayes filters work by learning the spam vs non-spam (called 'ham') you receive. Basically you feed a few hundred spams and a few hundred legit mails into the filters, and they identify tokens that can be used to rate the probability that a certain mail is spam vs. ham. The more mail you use for training, the better the results.

I support a mail system for 5000 or so users, and we've implemented a site-wide Bayes filtering system. Mail is "auto-learned" based on how it scores in our convential, keyword pattern matching SpamAssassin software. If it scores very high, Bayes learns it as spam. Very low, learned as ham. It took a couple of months to learn enough to have a dramatic effect, but at one point 40% or so of our inbound mail was being flagged as spam. Now that our Bayes filters have learned 10000 or so spams and hams Bayes has eliminated flase positives and has pushed ratings for obvious spams very high such that I just delete them and never deliver them to the user. Overall, only about 15% of the mail we receive is delivered to users as probable spam, which is a pretty amazing number for a large company (if I can pat myself on the back a bit).

There are personal Bayes anti-spam filters that you can use with Outlook, etc. See the FAQ link on the page below, which also has plenty of details on how Bayes works if one is interested.

Main page: http://www.paulgraham.com/antispam.html Page with Links to software (mostly free) for Outlook, etc.: http://www.paulgraham.com/spamfaq.html

Matt

-- PGP RSA Key ID: 0x1F6A4471 aim: beyondzero123 PGP DH/DSS Key ID: 0xAFF35DF2 icq: 120941588 http://blogdayafternoon.com yahoo msg: beyondzero123

I am the resurrection and I am the life, I couldn't bring myself to hate you as I'd like.

-Ian Brown



More information about the lbo-talk mailing list