mail filter program with
web-based tuning tool
to deliver only wanted mail
by using block/buddy lists and
naive Bayesian statistical analysis.
Version 3 1
( A Zeta Version )
Download blmom31.zip or blmom31.tgz for the source.
The readme.txt file.
Project at SourceForge.net is jady-mailomat.
Use your mail software as usual (Eudora, Netscape, Outlook, etc.).
You'll see the mail that's yours.
But first, before you get mail, go to your web browser and visit Buddy-List Mail-O-Mat to filter your mail. No big deal if you don't. You'll just get any spam delivered since the last filtering. Your mail is filtered periodically (setup by your webmaster, usually on the hour).
Whenever you give your mail address at a web site, visit Buddy-List Mail-O-Mat and look at your mail in case they sent you something spam-like you need to know. You'll probably want to add their mail address to your buddy list.
It's simple. Here's what happens when mail arrives.
Mail is delivered to your mail address — very normal. Periodically, your mail is filtered (setup by your webmaster, usually on the hour). Your mail is also filtered whenever you visit Buddy-List Mail-O-Mat. This filtering classifies mail either as blocked, from a buddy, contains executables, looks like spam, or it's good mail. Mail that contains executables has attachments with an extension of bat, cmd, com, cpl, exe, pif, scr, vb, vbe, vbs, or wsf. All but buddy and good mail is deleted. But, not to worry, all mail is preserved and accessible by Buddy-List Mail-O-Mat.
Basically, use your mail software as usual (Eudora, Netscape, Outlook, etc.).
You'll see only mail that's yours. NO SPAM! Well, almost. Mostly.
Every once in a while, using your web browser to visit Buddy-List Mail-O-Mat, look at your mail to see what you're missing. Maybe do this weekly. Also, do this whenever you sign-up for some mailing list or whenever you buy something on-line.
Finally, every once in a while, using your web browser to visit Buddy-List Mail-O-Mat, look at your mailboxes to tune the naive Bayesian statistical analysis program. Maybe do this weekly or whenever the spam gets too heavy.
Start your web browser and go to the web page assigned to you by your webmaster. If your web site is using a private security certificate, you may need to reply "Yes" at the security alert which says the security certificate has expired or is not yet valid. This will keep your web browser session hidden from probing eyes. And, when prompted, use your userid and password provided by your webmaster.
You should see something like this:
The main Buddy-List Mail-O-Mat web page has options for controlling the listing.
We're going to view a message.
Now suppose you clicked on the message size "1657" next to the 5th entry titled "Introducing the new P Patch, say bye bye to pills!" This message was recognized as spam and was deleted from the mailbox.
You should see something like this, in a new browser window:
Okay, so this isn't something you need to see. But suppose it was a good message you would want. Click the "
While viewing a mail message you can see it as a web page (only, of course, if it was sent in that format, like the example above). To this purpose, there's a "
Now let's go back to the list window. We're going to do some buddy-list work.
Now suppose you clicked on the from address "firstname.lastname@example.org" on the 4th entry titled "you may want to know".
You should see something like this, in a new browser window:
You could click on the "
But really, don't bother adding spam mailers to your block list. Just let the naive Bayesian statistical analysis handle them.
Note: If you have a buddy who sends you a lot of spam but a few good messages, consider removing them from your buddy list and let the naive Bayesian statistical analysis handle them. (Really, your buddy may not be a spammer but their address has been forged. See Forged Address.)
Users in a domain can share a buddy list. This shared buddy list can be fed mail addresses from sales order pages and from a web page which solicits visitor registrations. There is a link on the list window in the "To...From" heading. Click there and you should see something like this, in a new browser window:
Now let's go back to the list window.
We're going to tune the naive Bayesian statistical analysis program.
Mark the radio buttons in the mail and spam columns. You don't need to mark all messages, just mark as spam those that you don't want and mark as mail those that you do want. Then click the "mail/spam TUNE Bayesian" button.
Every once in a while mark all good messages as mail and click the "mail/spam TUNE Bayesian" button. (Of course, you already mark and tune the spam.) This helps tune the naive Bayesian statistical analysis program to the kind of mail you like to receive and thereby makes for more accurate analysis of mail from spam.
Suppose you clicked on a "spam" radio button and changed your mind but didn't want to mark the adjacent "mail" radio button — you just want to un-mark the "spam" radio button. There is a "RESET radio buttons" button at the bottom. Note it un-marks all radio buttons.
Buddy-List Mail-O-Mat implements naive Bayesian statistical analysis filtering technology. It's a method using statistics and learns by example. You tell it which mail is spam and what's not. This method is currently extremely effective, but some pirates will develop messages that defeat statistical analysis. Nevertheless, this method will be useful, because it will greatly minimize the number of spam messages you receive. Also, this method generates very few false-positives (good mail marked as spam). The result is a low-maintenance mail-handling system.
Tokens are used to rate messages. A token is simply a word.
Correctly spelled words. Misspelled words. Invented words. Even words with numbers in them.
Tokens are any sequence of letters and numbers separated by spaces, commas, periods, etc.
Lone numbers are not tokens. A token may be no longer than 23 symbols, and no less than three.
It's a list of tokens found in mail and spam, collected whenever you do Bayesian tuning.
The mail and spam columns show counts of messages that contain each token.
The score column shows scores used to rate messages.
A question symbol indicates a token not used for rating
because there are fewer than five total messages with that token
or the token score is greater than ten and less than ninety.
A token score is the spam count divided by spam plus mail counts, multiplied by 100.
There's only one thing you can do with this tuner tokens list, besides satisfy your curiosity.
You can delete all tokens with fewer than some limit of messages — use the form at the top.
This is more than just a curiosity. You can look here after doing Bayesian tuning to see if it had any effect on the score. If not, you might want to repeat the tuning several times. Understand that a message is scored by looking at the top ten spam-like tokens and the top ten mail-like tokens, generating an average score. Tokens used about as frequently in spam as in mail are ignored. This is the essence of naive Bayesian statistical analysis.
When Buddy-List Mail-O-Mat filters mail, it reads messages from your POP3 mailbox.
Then it analyses the message.
If it's blocked, spam or virus, then Buddy-List Mail-O-Mat deletes the message from your POP3 Mailbox;
otherwise it's left there for you to retrieve using your mail software.
Bad people send dangerous mail which come in two forms:
executable program attachments and pernicious links.
All mail with executable attachments
(any file with an extension of bat, cmd, com, cpl, exe, pif, scr, vb, vbe, vbs, or wsf)
almost always contain viruses and other nasties.
So, we avoid them altogether.
If you have a buddy that wants to send you a program,
have them put it in a zip file or such-like.
There remains one problem: Some pirates forge the "from" address,
so you may receive spam that's apparently sent by a buddy.
While that's annoying,
a greater unseen danger is that pirates will forge your address in their spam,
damaging your reputation.
Currently, there is no defense to this problem,
known as "joe-jobbing" or "email spoofing." But there is hope.
A new mail paradigm, called SPF (Sender Policy Framework),
is being developed and implemented.
It's designed to stop mail address forgery.
It will require some changes to your mail habits and cause some temporary pain.
I will do my best to help minimize the pain.
SPF won't eliminate spam but it will assure the accuracy of the identity of the sender of mail.
So, even when SPF is fully implemented, we'll still need Buddy-List Mail-O-Mat!
You're on vacation, visiting family, whatever. You're not at home.
You have your notebook computer and need to check your mail but must use a dial-up connection.
You want to get on and off quickly.
A special version of Buddy-List Mail-O-Mat is available — ask your webmaster.
This Copyleft is a license granting everyone the freedom
to use, modify, translate and distribute the copyrighted work,
or any work derived from it, provided these terms are unchanged.
This Copyleft permits additional Copyleft copyright notices.
This Copyleft extends to all representations of the work,
including, for example, a binary compilation of a program.
This Copyleft permits remuneration for distribution of the work
and for services relating to use of the work.
This Copyleft makes no warranty, assurance nor guarantee as to usefulness or correctness;
such is the duty of whoever uses or services the work.
This Copyleft does not put the work into the public domain.
Where this Copyleft is prohibited by law,
the work shall be considered copyright by the original copyright owner with All Rights Reserved.