ListFilter automatically performs moderation on a mailing list. This eliminates the need for a human moderator, while ensuring that most off-topic or otherwise offensive messages are blocked.
The user of the system will be a mailing list owner (or the owner's assistant). Acceptable messages are forwarded immediately by the system to the user's mailing list for distribution, while any questionable messages are forwarded to the user (the list owner). Since over 95% of the messages are likely to be good ones, the moderator only has to deal with the remaining 5%.
The moderation program runs 24 hours per day, 7 days per week on our server. Message traffic can keep flowing at all times, without the list owner having to constantly check for new messages and forward them to the list.
All messages posted on a given mailing list are assigned a score, based on keywords detected, message size and other measurements. The list owner can tune these measurements to suit his list, and his moderation policy.
The list owner can set the policy, and let the moderation program enforce that policy. The list owner is free to be away from the list for long periods of time, secure in the knowledge that messages are flowing through with little delay, and each one is being checked carefully.
After signing up with Rapid Deployment Software, you will be given a login name and password for your mailing list.
The first thing you need to do after logging in, is to go into General Settings (see below) and check that the e-mail address for rejected messages is correct.
If you have a Topica list, you will also have to enter the address for forwarding acceptable messages. This address includes a password that you create with Topica.
Initially, very few moderation parameters will be set up, other than a generic list of "bad" words, initialized for your convenience by Rapid Deployment Software (RDS). You may want to set up other parameters (see below) before you switch moderation over to this program.
When you are ready to have messages automatically moderated, you will simply tell your mailing list system that moderator@ListFilter.com is the moderator. If anything goes wrong, or you don't like the way the system is working, you can change the moderator back to your own address, or take the list out of moderation altogether.
At any time, you can adjust your moderation parameters. The changes take effect immediately, and will apply to the next message posted to your list.
In the following sections we describe how the scoring system works, and how you can customize it for your list. Everything can be controlled via a Web GUI interface at ListFilter.com. Just log in and click the name of your list.
You have a list of "bad" words or phrases that might indicate an offensive or off-topic message. Initially, this list will contain over 100 words that we have chosen. Each word in the list carries a certain penalty value, as indicated by the number beside it. When you click "Save Changes" the system automatically sorts the list, and puts all words into upper case. For example,
1-800-, 12
CASINO, 8
DAMN, 5
OFFER EXPIRES, 10
You should modify this initial set of words and values to suit the subject of your mailing list, and your own policy. Some words are there to catch typical SPAM. Others are there to catch foul language. Depending on the nature of your mailing list, you might want to allow foul language, and if your list happens to be about gambling you would obviously delete the entry for the word "CASINO". Software developers might want to enter passwords, registration codes, etc. that some people on the list know, but others are expected to pay for. With ListFilter, you have a program running 24 hours per day, reading every message, ensuring that critical information does not get leaked, either accidentally or deliberately.
The moderation program checks every line in every message, looking for bad words. Bad words are matched without regard to upper/lower case. So, CASINO would match CASINO, Casino, casino, casinos etc. When a bad word on your list is 3 characters or less, it must match a complete word, not just a substring of a longer word. Otherwise short words might trigger too many spurious matches.
The score goes up with each occurrence of any bad word, but the penalty is reduced by 20% for a second occurrence of the same bad word, and another 20% for the third occurrence, and so on. Eventually the penalty (rounded to the nearest integer) will reach zero. This helps to reduce false alarms. It's more significant to see two different, equally bad, words appearing in a message, than it is to see two occurrences of the same bad word.
For example, 3 occurrences of CASINO would generate a score of 8 + 6 + 5 (rounded off to the nearest integer), rather than 8 + 8 + 8 as you might expect.
So, for example, a message that contained CASINO and OFFER EXPIRES and 1-800- and another CASINO would be scored as:
8 + 10 + 12 + 6 = 36
If the threshold was 30, the message would not be approved. It would be forwarded to you for evaluation.
Note that this scoring system is more subtle than simply saying that any message containing CASINO must be blocked, or any message with an 800-number must be blocked. We'll soon see how even more subtlety can be added to the system. The goal is to create an artificially intelligent moderator that can make the right decision in almost all cases, while letting you have the final say about rejecting messages.
In addition to a list of "bad" words, you also have a list of "good" words. These are on-topic words specific to the subject matter of your mailing list.
Suppose your mailing list was about baseball.
Your list of good words might contain:
BASEBALL, 3 INNING, 3 STRIKE, 2 BAT, 2 WIN, 1 LOSE, 1
Here you have a list of words that might come up in a discussion of baseball. Some words, like BASEBALL and INNING, are highly-specific to baseball and are unlikely to come up in other conversations. These words are given a score of 3. Other words, like STRIKE and BAT, are common in discussions of baseball, but could come up in other subjects. These words get a score of 2. Finally, words like WIN and LOSE are common to all sports and many other areas of life as well, so they are scored as just 1 point.
The moderation program will count up the occurrences of good words in each message. Each occurrence adds its score to the total. The score for a good word remains fixed, no matter how many times that word occurs in a message.
The program will compute the density of good word points, i.e. the total number of points divided by the total number of bytes in the message. An on-topic message should have a reasonable number of points for it's size. A long message with a very low point-count will be assessed a large penalty for being off-topic. This penalty might not be enough to block the message, but undesirable off-topic messages often incur other penalties. For example, they may be SPAM messages containing typical SPAM phrases. Note that almost all messages are assigned some small penalty for being off-topic, usually from 1 to 10 penalty points, which is unlikely to cause the message to be blocked, assuming a threshold of 30.
To get accurate off-topic penalties, it's recommended that you have 100 or more on-topic words. The system automatically increases the weight of the off-topic penalty as you add more good words. With a small number of words, the program can't confidently say that a message is truely off-topic, so the penalty will be small.
There's a Bad Person list where you can enter the names or e-mail addresses of people who tend to cause trouble. Each person can be assigned a number according to how bad he is. This number is subtracted from the threshold value, so the person is more likely to be blocked. For example,
JOHNSMITH@, 10 JUNKYARDDOG@YAHOO.COM, 20 SPAMMER@HOTMAIL, 250
The program will match these strings against the FROM: line in the e-mail message. If there's a match, the threshold will be lowered by the corresponding amount. The string you enter is normally the complete e-mail address, but could be just a fairly unique part of it.
In the example, if the threshold were normally 30, it would be lowered to just 30-10 or 20 for any messages posted by JOHNSMITH@... John Smith would have to stay under 20 points to avoid being blocked. SPAMMER would be blocked no matter what, since 250 is much greater than 30.
For example,
NICEGUY@HOTMAIL.COM, 10
THEBOSS@MYCOMPANY.COM, 50
Here you are increasing the threshold on all messages posted by these two people. NICEGUY is given an extra 10 points, so if the normal threshold is 30 for other people, he will only have to stay under 40.
The threshold value (the number of penalty points that will cause a message to be blocked) is usually at least 30. Whenever a message is blocked by the moderation program, it will be sent to you for a second opinion. If you happen to be away from your computer, perhaps at home, sleeping, or on vacation, there could be a huge delay before you can approve a message that was unfairly blocked. If you have an assistant, you could change the e-mail address where blocked messages are sent, and let your assistant take over. Another alternative is to vary the threshold value according to the time of day. During the hours that you are available to check your e-mail, you could have a strict threshold. During other hours, when you are usually away from your computer, you could have a looser threshold, say 50. That way you can reduce the chance that an acceptable message will be blocked and left in limbo for several hours.
For example, suppose you normally check your e-mail at 10:00am and 5pm each day. You might set up a schedule like this, where things get tighter as the time approaches for your e-mail check:
12am: 50 1am: 50 2am: 50 3am: 50 4am: 50 5am: 45 6am: 40 7am: 35 8am: 30 9am: 30 10am: 50 11am: 50 12pm: 45 1pm: 40 2pm: 35 3pm: 30 4pm: 30 5pm: 50 6pm: 50 7pm: 50 8pm: 50 9pm: 50 10pm: 50 11pm: 50
The times shown are the starting times for each one hour period. For example, 7am: 35, means set the threshold to 35 during the period 7am to 7:59am. Currently, if you have a much different schedule on the weekend vs. weekdays, you will have to change things on Friday afternoon and Monday morning. In future, we may have two different daily schedules that you can record.
send approved messages to:
This appears for Topica lists only. It lets you specify the address where approved messages should be sent. The address includes a password that you set on Topica's Web site. You must set this address correctly for messages to get through to your subscribers.
send rejected messages to:
This is the address where rejected messages will be sent.
This is normally your address, or the address of your assistant.
Messages will come from moderator@ListFilter.com and will have
a subject line something like:
Subject: BAD(45/30) <8 CASINO> <7 CASINO> <30 OffTopic>
indicating the overall score, the threshold, and a list of reasons for that score.
On MSN or Yahoo this address must be designated as one that has permission to approve or reject messages. You will approve or reject messages in the same way that you normally would on these systems. Instructions are included in the message.
On Topica, this can be any address, perhaps not even subscribed to the list, since to approve a message, you will forward it back to moderator@ListFilter.com, which is considered by Topica to be the sole moderator of your list. When the moderation program sees the "Fwd:" in the subject line, it will "rubber-stamp" the message, and send it to Topica, no matter what score it gets. To reject a message, just ignore, or delete it.
Threshold Reduction for New Posters:
This reduction in the threshold value is applied whenever a new person posts his first message. New people might be spammers, previously-bad people who have resubscribed under a new address, or people who simply don't know the etiquette of the list. It's a good idea to scrutinize them more carefully.
Max Score to allow New Poster into Database:
When a new person posts a reasonable message, well below the threshold, the program will add him to the database, and no longer apply the threshold reducton to his messages. This value is the maximum score allowed to let a new person into the database and remove him from "suspicion".
Max consecutive lines of quoted material:
Topica only: this lets you automatically chop long sections of quoted material. People are often lazy, making a one line reply and then quoting a 500-line message that people saw recently. The long section will be terminated at the value you set, and "<snip>" will be inserted at that point.
Max consecutive empty quoted lines:
Topica only: Sometimes there's a quoted section in a message that has a bunch of blank lines. This lets you tidy that up.
Max size of message with no penalty:
Messages up to this size (in bytes) will have no penalty assigned for being too long.
Number of bytes per penalty point in oversize messages:
Once a message exceeds the maximum size with no penalty, it starts getting penalized at the rate of one point per so-many bytes. Large messages may be painful for people to download if they have slow dial-up connections. e.g. You might specify that the penalty is 0 up to 5000 bytes, and then 1 point per 1000 bytes above 5000. Penalizing long messages will also help you to catch messages with large attachments. These could be worms or viruses. ListFilter also looks for any attachment with an apparent executable file type, such as .exe, .bat. .com, .scr, .pif, .lnk, .vbs etc.
Each entry shows the poster's address, the subject, the score computed by the moderation program and the threshold. Detailed scoring information is in the form of a list of bracketed numbers with words after them, e.g.
Mon 2:01pm JOHN SMITH <JOHNSMITH@YAHOO.COM>
RE: WISH LIST
SCORE: 8 <5 DAMN><3 OffTopic, 17 good / 779 bytes>
THRESHOLD: 55 <35 time_of_day><20 Good Person>
This shows that the score is 8, which equals 5 for using the word DAMN, plus an off-topic rating of only 3 (good). The off-topic rating is based on 17 good word points in a message body of size 779 bytes.
The threshold value is based on 35 for this time of day, plus some extra leeway of 20 points that you've assigned to John Smith, since he is a better than average poster to this list.
Rejected messages will have a score that equals or exceeds the threshold. They will also have "*BAD*" beside them.
The moderation program checks for, and deletes, any messages that are duplicates of recent mesages. These messages have "*DUP*" beside them.
Topica only: rejected messages that you subsequently approve, by forwarding them back to moderator@ListFilter.com, have *OK* beside them. There is no 2-cent charge for these messages.
Total Posted, Recent Average, Ads, Poster, Recent History 39 3.5 ad "JOHN SMITH"(4, 5, 2, 8, 2, 2, 3, 7, 1, 1) 11 11.3 "CARL JONES" (13, 3, 4, 3, 31, 24, 17, 5, 4, 9)
This could be of use to you in choosing Bad People and Good People.
Topica only: There's an indication of which posters have advertising in their messages. ListFilter automatically snips typical Yahoo and HotMail ads at the end of messages.
In addition to the typical Yahoo, Hotmail, and other ads that are handled automatically, you can enter a list of specific advertisements that you want snipped out. You have to give the first line of the ad, plus the maximum number of additional lines to snip. The system will stop snipping automatically when it comes to a very short line, but otherwise will snip until it reaches the maximum you have set.
There are links for configuring your lists, adding funds to your account, changing your password, and logging off.
You should change your password from time to time if you are worried that someone might get in and mess up your settings.
If you forget your password, contact Rapid Deployment Software.
When you log off, the system forgets who you are (destroys your "cookie"), so that you will have to login again. This is useful in case you want to leave your computer unattended for a while, and not allow anyone to tamper with your settings. The system will also forget who you are whenever you close your Web browser.
Q: I subscribe to a mailing list. Can I use ListFilter?
A: ListFilter can only be used by someone who is the owner,
manager, or moderator, i.e. someone who's "in charge" of a
mailing list. The average subscriber does not have permission
to decide how a list should be moderated, although he/she could
recommend ListFilter to someone in charge.
Q: Messages are taking a long time to get through. Is this the fault
of my mailing list service, or is it a problem with the moderation system?
A: Messages flow from the poster to the list service, then to the
moderation program, then back to the list service where they appear
on the message board on the Web. Finally, the list service distributes
the message via e-mail to those who have chosen that option.
Log in to ListFilter and check the Recent Messages.
The messages listed here are ones that have already been moderated and
either sent back to your list service for distribution, or rejected
and sent to you for a second opinion.
The moderation system usually processes incoming messages within
a few minutes of receiving them. Long delays in message traffic
are generally due to your list service having built up a huge backlog
of messages to send out. This backlog usually affects all lists
and is sorted out by the list service within a few hours.
Q: Will the poster of a bad message be informed when his message is
blocked?
A: On Yahoo and MSN (but not Topica) you can set an option to
automatically inform people. This option is a feature of the list service,
not ListFilter.
Blocking a message can be a very subjective decision on your part,
and very few posters will agree that their message should be blocked.
By telling the poster why he was blocked, you could trigger a nasty
and time-wasting exchange of e-mails. It may be better to simply tell
your subscribers that your policy is to block off-topic posts without
informing anyone, and that you don't guarantee that this will always
be done fairly or consistently. In many cases the poster won't even notice
that his message was blocked, and will write it off as slowness or
unreliability in the system.
Q: Is it better to unsubscribe a bad person, or just put him on the
Bad Person list with a very high score?
A: If you get a really bad person or spammer, it may be better
to give him an impossibly high score on the Bad Person list.
This way, he won't get any feedback and won't
know the best way (for him) to proceed. For instance he may waste
his time trying to post again from the same address, rather than
resubscribing with a new address. He may also conclude that there's
a human carefully reading all messages, so there's no chance to ever
get his bad messages through.
Q: Should I let people know that I am using an automatic system for
moderation?
A: It will be better if you don't tell too many subscribers that you are
using an automatic moderation system. If a trouble-maker knows
that a program is doing the moderating, he could start playing games
to defeat the system. You can ultimately win these games, but it
will be a waste of time.
Q: I can ban someone by giving his e-mail address a high score on
the Bad Person list, but he can easily sign up for a new e-mail address
on Yahoo, HotMail or other free service and then resubscribe.
What can I do?
A: If you have a trouble-maker who subscribes under many different
addresses, you are still protected by the Bad Words list
and other tools. To catch more of his messages, you can add to the
Bad Word list any unusual words that he likes to use, or any words
that he frequently misspells. Also keep in mind that new people
(i.e. new addresses) have a lower threshold the first time
they post, so you may catch him that way.
Q: What can I do about people who are on-topic, but post too many
messages, and are frequently boring or annoying?
A: The system is set up to delay people who post a large percentage
of messages over a relatively short period of time. The amount
of delay depends on whether it's a "good" person, a "bad" person,
or someone else. The delay also depends on the score given to
the message. Delays can be anywhere from about 5 minutes to
about 30 minutes.