Buddy-List Mail-O-Mat

An open-source
mail filter program with
web-based tuning tool
to deliver only wanted mail
by using block/buddy lists and
naive Bayesian statistical analysis.

Version   3 • 1
( A Zeta Version )
Download blmom31.zip or blmom31.tgz for the source.
The readme.txt file.
Project at SourceForge.net is jady-mailomat.



User Guide
topics hints
Quick Guide
How It Works
What You Do
Visit Buddy-List Mail-O-Mat
  • View A Message
  • Buddy-List Work
  • Shared Buddy-List
  • Bayesian Tuning
Technical Details
mail registration
online purchases
subscription mail
web page warning
statistical analysis
tuner tokens
executables
forged addresses
on the road


Quick Guide

Use your mail software as usual (Eudora, Netscape, Outlook, etc.).
You'll see the mail that's yours.

But first, before you get mail, go to your web browser and visit Buddy-List Mail-O-Mat to filter your mail. No big deal if you don't. You'll just get any spam delivered since the last filtering. Your mail is filtered periodically (setup by your webmaster, usually on the hour).

Whenever you give your mail address at a web site, visit Buddy-List Mail-O-Mat and look at your mail in case they sent you something spam-like you need to know. You'll probably want to add their mail address to your buddy list.




How It Works

It's simple. Here's what happens when mail arrives.

Mail is delivered to your mail address — very normal. Periodically, your mail is filtered (setup by your webmaster, usually on the hour). Your mail is also filtered whenever you visit Buddy-List Mail-O-Mat. This filtering classifies mail either as blocked, from a buddy, contains executables, looks like spam, or it's good mail. Mail that contains executables has attachments with an extension of bat, cmd, com, cpl, exe, pif, scr, vb, vbe, vbs, or wsf. All but buddy and good mail is deleted. But, not to worry, all mail is preserved and accessible by Buddy-List Mail-O-Mat.



What You Do

Basically, use your mail software as usual (Eudora, Netscape, Outlook, etc.).
You'll see only mail that's yours. NO SPAM! Well, almost. Mostly.

Every once in a while, using your web browser to visit Buddy-List Mail-O-Mat, look at your mail to see what you're missing. Maybe do this weekly. Also, do this whenever you sign-up for some mailing list or whenever you buy something on-line.

Buy Online, Register, Subscription

When you buy something online or register your mail address with some website, visit Buddy-List Mail-O-Mat after a few minutes to see if they sent you something. If you want to receive future messages, add them to your buddy list.



Finally, every once in a while, using your web browser to visit Buddy-List Mail-O-Mat, look at your mailboxes to tune the naive Bayesian statistical analysis program. Maybe do this weekly or whenever the spam gets too heavy.



Visit Buddy-List Mail-O-Mat

Start your web browser and go to the web page assigned to you by your webmaster. If your web site is using a private security certificate, you may need to reply "Yes" at the security alert which says the security certificate has expired or is not yet valid. This will keep your web browser session hidden from probing eyes. And, when prompted, use your userid and password provided by your webmaster.
      You should see something like this:
Buddy-List Mail-O-Mat
Buddy-List Mail-O-Mat
Getting your mail .......
LOG messages:
Your mail has been filtered.
»»» >>> Go get your mail now. <<< «««

        SETUP   HELP
Which day(s)? today yesterday both all since: until:
Which messages? current previous(...)   Sort by? date/time(new-first) subject from
5 messages for user@domain.com
m
a
i
l
s
p
a
m
rate
 
size
Subject   (blue=mail red=spam black=virus)
[click size to view message]
[click rate to see tokens used to rate message]
«— [to tune, mark mail or spam, and use TUNE button below.]
date
time
To
From
[click here to open shared buddy list]
[click address to open buddy list]
block
1091
got a deal for you 03/29
06:31
To user@domain.com
Fr whadm@deals.com
virus
37K
this will help you look good and feel great 03/29
06:10
To user@domain.com
Fr bjpau@email.si
buddy
2015
okay tomorrows fine 03/29
05:27
To user@domain.com
Fr friend@aol.com
47
1119
you may want to know 03/29
05:08
To user@domain.com
Fr jblue_wa@wanadoo.fr
99
1657
Introducing the new P Patch, say bye bye to pills! 03/29
05:01
To user@domain.com
Fr P493PATCH@aol.com


The main Buddy-List Mail-O-Mat web page has options for controlling the listing.
        - "Which messages?" relates to, well, which messages. Let me explain: A log file holds summary information about messages. It's constantly being added-to as mail arrives. The log file is cycled (usually weekly), so the current log file becomes the previous log file and a new log file is started. Old "previous" log files are eventually purged. The exact date and time this happens is shown in the parenthetical.
        - "Which day(s)?" is pretty obvious.
        - "Sort by?" is also pretty obvious.

View Message

We're going to view a message.
Now suppose you clicked on the message size "1657" next to the 5th entry titled "Introducing the new P Patch, say bye bye to pills!" This message was recognized as spam and was deleted from the mailbox.
      You should see something like this, in a new browser window:
Buddy-List Mail-O-Mat
Buddy-List Mail-O-Mat
mail this message to me   view mail as a web page
From bjpau@email.si Mon Oct 20 23:37:41 2003
Received: from P493PATCH@aol.com (aol.com [1.2.3.4])
by domain.name (8.12.9/8.11.0) with SMTP id xxxxxxxx
for <name@your.com>; Mon, 20 Oct 2003 23:37:40 -0600
Date: Mon, 20 Oct 2003 23:37:39 -0600
Message-Id: <200310210537.xxxxxxxx@domain.name> ;
X-Authentication-Warning: domain.name: name owned process doing -bs
Received: (qmail 23440 invoked from network); 19 Oct 2003 09:02:04 -0000
Received: from 186.1-1.2.3.4.telemar.net.br (HELO mailhost.hetnet.nl) (1.2.3.4)
by aol.com with SMTP; 19 Oct 2003 09:02:04 -0000
From: "Kittie" <P493PATCH@aol.com>
To: "kknzoeglklx@cs.com" <kknzoeglklx@cs.com>
Subject: Introducing the new P Patch, say bye bye to pills!
MIME-Version: 1.0
Content-Type: text/html

<html><body>
<p>< ;a href="http://www.decode6.com/mi/">& lt;img
border="0" src="http://www.timehere.com/h.jpg"> ;</a></p>
<a href="http://www.decode6.com/h.html"&g t;vwhtgu</a>
</body></html>
Okay, so this isn't something you need to see. But suppose it was a good message you would want. Click the "mail this message to me" link. It'll get sent to you. Go to your mail program and get the message. Also, you'll probably want to add the sender to your buddy list — read on in the next section about buddy list.
Web Page Warning

While viewing a mail message you can see it as a web page (only, of course, if it was sent in that format, like the example above). To this purpose, there's a "view mail as a web page" link next to the "mail this message to me" link. Warning: Use this feature only if you practice strong security, because web-page messages can have tricks dangerous to the health of your computer.

Buddy List

Now let's go back to the list window. We're going to do some buddy-list work.
Now suppose you clicked on the from address "jblue_wa@wanadoo.fr" on the 4th entry titled "you may want to know".
      You should see something like this, in a new browser window:
Buddy-List Mail-O-Mat - block/buddy list
Buddy-List Mail-O-Mat
BLOCK/BUDDY LIST for user@domain.com


         
Notes:
  addresses only, no nicknames, no >\'s, no <\'s
  case is ignored; "a" and "A" appear identical
  ".*" represents zero or more of any symbol
  ".+" represents one or more of any symbol
  "." represents any one symbol
  "\." represents one period symbol (a dot)
  "^" marks start of address and "$" marks end
  "@*" will be replaced with "@.*"
  any "*" at beginning of address is removed
  ".+" is prefixed if address starts with "@"
  so, "@domain.com" becomes ".+@domain.com"

z
a
p
b
u
d
d
y
b
l
o
c
k
Mail (buddy in blue)
Spam (block in red)

mail addresses (sorted by domain, then name)
friend @aol.com
mary @comcast.net
whadm @deals.com
joe @pipeline.com
jane @rcn.com
 

You could click on the "add to buddy list" button to add "jblue_wa@wanadoo.fr" to your buddy list. But, more likely, you'd like to click on the "add to block list" button to add it to your block list. Also, notice the radio buttons to the left of each address. Click on the zap button to delete the address on that line. Click on the buddy button to move the address on that line to your buddy list. Click on the block button to move the address on that line to your block list. After making these selections, click the "process selections" button.
      But really, don't bother adding spam mailers to your block list. Just let the naive Bayesian statistical analysis handle them.
      Note: If you have a buddy who sends you a lot of spam but a few good messages, consider removing them from your buddy list and let the naive Bayesian statistical analysis handle them. (Really, your buddy may not be a spammer but their address has been forged. See Forged Address.)
Shared Buddy List

Users in a domain can share a buddy list. This shared buddy list can be fed mail addresses from sales order pages and from a web page which solicits visitor registrations. There is a link on the list window in the "To...From" heading. Click there and you should see something like this, in a new browser window:
Buddy-List Mail-O-Mat - shared buddy list
Buddy-List Mail-O-Mat
SHARED BUDDY LIST for domain.com
z
a
p
mail addresses (sorted by domain, then name)
george @comcast.net
harry @pipeline.com
adam @rcn.com
 



Bayesian Tuning

Now let's go back to the list window.
We're going to tune the naive Bayesian statistical analysis program.
Mark the radio buttons in the mail and spam columns. You don't need to mark all messages, just mark as spam those that you don't want and mark as mail those that you do want. Then click the "mail/spam TUNE Bayesian" button.
       Every once in a while mark all good messages as mail and click the "mail/spam TUNE Bayesian" button. (Of course, you already mark and tune the spam.) This helps tune the naive Bayesian statistical analysis program to the kind of mail you like to receive and thereby makes for more accurate analysis of mail from spam.

Suppose you clicked on a "spam" radio button and changed your mind but didn't want to mark the adjacent "mail" radio button — you just want to un-mark the "spam" radio button. There is a "RESET radio buttons" button at the bottom. Note it un-marks all radio buttons.



Bayesian Statistics

Buddy-List Mail-O-Mat implements naive Bayesian statistical analysis filtering technology. It's a method using statistics and learns by example. You tell it which mail is spam and what's not. This method is currently extremely effective, but some pirates will develop messages that defeat statistical analysis. Nevertheless, this method will be useful, because it will greatly minimize the number of spam messages you receive. Also, this method generates very few false-positives (good mail marked as spam). The result is a low-maintenance mail-handling system.



Tuner Tokens

Tokens are used to rate messages. A token is simply a word. Correctly spelled words. Misspelled words. Invented words. Even words with numbers in them. Tokens are any sequence of letters and numbers separated by spaces, commas, periods, etc. Lone numbers are not tokens. A token may be no longer than 23 symbols, and no less than three.
        Go back to the list window. Click on the "tokens" link in the Subject heading. Be patient, it's a long list. You should see something like this, in a new browser window:

Buddy-List Mail-O-Mat - Tokens used to rate message.
Buddy-List Mail-O-Mat — Tokens used to rate message.
LOG messages:
90 is the score value above which a message is considered spam.
with fewer than MAILS plus SPAMS
TOKEN SCORE MAILS SPAMS
...
8bit93788
administration?33
administrator?01
appointment9907
astrology6161
based?1436
daily9202
drugs99011
huge9906
intimate9905
medications9905
pharmacy9908
pill9906
prescription99015
prior9908
range9905
sexual9905
viagra99010
...

It's a list of tokens found in mail and spam, collected whenever you do Bayesian tuning. The mail and spam columns show counts of messages that contain each token. The score column shows scores used to rate messages. A question symbol indicates a token not used for rating because there are fewer than five total messages with that token or the token score is greater than ten and less than ninety. A token score is the spam count divided by spam plus mail counts, multiplied by 100. There's only one thing you can do with this tuner tokens list, besides satisfy your curiosity. You can delete all tokens with fewer than some limit of messages — use the form at the top.

Now let's go back to the list window and click on a score for some message.
You should see something like this, in a new browser window:

Buddy-List Mail-O-Mat - Tokens used to rate message.
Buddy-List Mail-O-Mat — Tokens used to rate message.
LOG messages:
90 is the score value above which a message is considered spam.
That message has a score value of 99.
score token
938bit
99sexual
99range
99intimate
99huge
99medications
99prior
99appointment
99viagra
99prescription
99drugs
99pill

This is more than just a curiosity. You can look here after doing Bayesian tuning to see if it had any effect on the score. If not, you might want to repeat the tuning several times. Understand that a message is scored by looking at the top ten spam-like tokens and the top ten mail-like tokens, generating an average score. Tokens used about as frequently in spam as in mail are ignored. This is the essence of naive Bayesian statistical analysis.




Technical Details

When Buddy-List Mail-O-Mat filters mail, it reads messages from your POP3 mailbox. Then it analyses the message. If it's blocked, spam or virus, then Buddy-List Mail-O-Mat deletes the message from your POP3 Mailbox; otherwise it's left there for you to retrieve using your mail software.
        When Buddy-List Mail-O-Mat reads messages from your POP3 mailbox, it first reads the message header and if it already got that message it moves on to the next message. This process avoids unnecessary processing.

Executables

Bad people send dangerous mail which come in two forms: executable program attachments and pernicious links. All mail with executable attachments (any file with an extension of bat, cmd, com, cpl, exe, pif, scr, vb, vbe, vbs, or wsf) almost always contain viruses and other nasties. So, we avoid them altogether. If you have a buddy that wants to send you a program, have them put it in a zip file or such-like.
       The other danger is a pernicious link. Here you must use common sense and wisdom. Don't click on any ol' link in a mail message. It can be dangerous. It can take you where you don't want to go. It can trick you into doing something stupid. For example, there are pirates out there who send messages that look like they come from PayPal. They don't. Click on the link and you're in dangerous territory — a place that looks just like PayPal asking you to enter your ID and password. Then you're in big trouble. Don't go there. If you're worried about your PayPal account, use your PayPal bookmark/favorites link and go there to see if PayPal really needs your attention.

Forged Address

There remains one problem: Some pirates forge the "from" address, so you may receive spam that's apparently sent by a buddy. While that's annoying, a greater unseen danger is that pirates will forge your address in their spam, damaging your reputation. Currently, there is no defense to this problem, known as "joe-jobbing" or "email spoofing." But there is hope. A new mail paradigm, called SPF (Sender Policy Framework), is being developed and implemented. It's designed to stop mail address forgery. It will require some changes to your mail habits and cause some temporary pain. I will do my best to help minimize the pain. SPF won't eliminate spam but it will assure the accuracy of the identity of the sender of mail. So, even when SPF is fully implemented, we'll still need Buddy-List Mail-O-Mat!
        Several Internet service providers (ISP), like AOL, Netscape, MSN, etc., are implementing techniques to assure that mail sent through their systems have a "from" address that belongs to the sender. They're also implementing techniques to assure that mail received by their systems have a "from" address that belongs to the system of the sender. (They may be using SPF.) So, you should set up your mail software to send mail through your ISP and with the mail address assigned to you by your ISP, and specify a "reply-to" with your Buddy-List Mail-O-Mat mail address. Also, consider forwarding mail from your ISP mail address to your Buddy-List Mail-O-Mat mail address, if your ISP offers this feature.




On The Road

You're on vacation, visiting family, whatever. You're not at home. You have your notebook computer and need to check your mail but must use a dial-up connection. You want to get on and off quickly. A special version of Buddy-List Mail-O-Mat is available — ask your webmaster.
        The "road" version of Buddy-List Mail-O-Mat is an abbreviated version with only the functions needed to check your mail. It leaves your mail on the server for when you get home. It shows a simplified list with links for viewing messages. Each message you view is opened in a separate browser window, so you can get all the messages you want to see and then logoff and go read each message off-line.
        The "road" version won't filter any mail, so as to minimize your on-line connection time. Meanwhile, remember, the server filters mail hourly.
        If you reply to any messages (or otherwise send mail while on the road) don't use your Buddy-List Mail-O-Mat mail address — use your ISP mail address. (Hint: blind-copy yourself to your Buddy-List Mail-O-Mat address.) Otherwise you'll be downloading your mail to your notebook (and not your desktop) and the mail will be deleted from the mail server.
        Here's what the "road" version looks like:
Buddy-List Mail-O-Mat On-The-Road
Buddy-List Mail-O-Mat On-The-Road
SETUP   HELP
Which day(s)? today yesterday both all since: until:
Which messages? current previous(...)   Sort by? date/time(new-first) subject from
5 messages for user@domain.com
rate
 
size
Subject   (blue=mail red=spam)
[click size to view message]
date
time
To
From
block
1091
got a deal for you 03/29
06:31
To user@domain.com
Fr whadm@deals.com
virus
37K
this will help you look good and feel great 03/29
06:10
To user@domain.com
Fr bjpau@email.si
buddy
2015
okay tomorrows fine 03/29
05:27
To user@domain.com
Fr friend@aol.com
47
1119
you may want to know 03/29
05:08
To user@domain.com
Fr jblue_wa@wanadoo.fr
99
1657
Introducing the new P Patch, say bye bye to pills! 03/29
05:01
To user@domain.com
Fr P493PATCH@aol.com



This Copyleft is a license granting everyone the freedom to use, modify, translate and distribute the copyrighted work, or any work derived from it, provided these terms are unchanged. This Copyleft permits additional Copyleft copyright notices. This Copyleft extends to all representations of the work, including, for example, a binary compilation of a program. This Copyleft permits remuneration for distribution of the work and for services relating to use of the work. This Copyleft makes no warranty, assurance nor guarantee as to usefulness or correctness; such is the duty of whoever uses or services the work. This Copyleft does not put the work into the public domain. Where this Copyleft is prohibited by law, the work shall be considered copyright by the original copyright owner with All Rights Reserved.

Copyleft Copyright © 2005 John G. Derrickson
freeVEDA.org/CopyleftJGD@freeVEDA.org

EvALUEwARETM — full-featured product offered free — money-back guarantee before-the-fact.
Register for update notice by email. Purchase to get the newest version.
EvALUEwARE is a trademark of John G. Derrickson

Covered by the GNU General Public License. Note the OSDL Linux Legal Defense Fund.
Thanks for inspiration from Paul Graham and John Rauser.
modified Wednesday December 31, 1969