ListFilter Moderation System

User Manual

 

(c) 2005 Rapid Deployment Software

 

Permission is freely granted to anyone
to copy this manual.

 


Contents

Introduction
Getting Started
Moderation Parameters - Introduction
Bad Words
Good Words
Bad People
Good People
Hourly Thresholds
General Settings
Recent Messages
Poster Performance
Viewing Your Account Information
Adding Funds to Your Account
Changing Your Password
Logging Off
Advertising Snipper
Frequently Asked Questions

Introduction

ListFilter automatically performs moderation on a mailing list. This eliminates the need for a human moderator, while ensuring that most off-topic or otherwise offensive messages are blocked.

The user of the system will be a mailing list owner (or the owner's assistant). Acceptable messages are forwarded immediately by the system to the user's mailing list for distribution, while any questionable messages are forwarded to the user (the list owner). Since over 95% of the messages are likely to be good ones, the moderator only has to deal with the remaining 5%.

The moderation program runs 24 hours per day, 7 days per week on our server. Message traffic can keep flowing at all times, without the list owner having to constantly check for new messages and forward them to the list.

All messages posted on a given mailing list are assigned a score, based on keywords detected, message size and other measurements. The list owner can tune these measurements to suit his list, and his moderation policy.

The list owner can set the policy, and let the moderation program enforce that policy. The list owner is free to be away from the list for long periods of time, secure in the knowledge that messages are flowing through with little delay, and each one is being checked carefully.

Getting Started

After signing up with Rapid Deployment Software, you will be given a login name and password for your mailing list.

The first thing you need to do after logging in, is to go into General Settings (see below) and check that the e-mail address for rejected messages is correct.

If you have a Topica list, you will also have to enter the address for forwarding acceptable messages. This address includes a password that you create with Topica.

Initially, very few moderation parameters will be set up, other than a generic list of "bad" words, initialized for your convenience by Rapid Deployment Software (RDS). You may want to set up other parameters (see below) before you switch moderation over to this program.

When you are ready to have messages automatically moderated, you will simply tell your mailing list system that moderator@ListFilter.com is the moderator. If anything goes wrong, or you don't like the way the system is working, you can change the moderator back to your own address, or take the list out of moderation altogether.

At any time, you can adjust your moderation parameters. The changes take effect immediately, and will apply to the next message posted to your list.

Moderation Parameters - Introduction

There are several moderation parameters that you can adjust. Everything is based on a flexible scoring system. If a message accumulates too many penalty points, and exceeds a threshold that you set, it will be forwarded to you. You can then approve or reject it manually. Messages that stay under the threshold are automatically approved and forwarded to your list server for distribution. A typical value for the threshold is 30 points, but you can adjust that, and even vary it according to the time of day (see below) and the person who is posting the message.

In the following sections we describe how the scoring system works, and how you can customize it for your list. Everything can be controlled via a Web GUI interface at ListFilter.com. Just log in and click the name of your list.

Bad Words

You have a list of "bad" words or phrases that might indicate an offensive or off-topic message. Initially, this list will contain over 100 words that we have chosen. Each word in the list carries a certain penalty value, as indicated by the number beside it. When you click "Save Changes" the system automatically sorts the list, and puts all words into upper case. For example,

       
        1-800-, 12
        CASINO, 8
        DAMN, 5
        OFFER EXPIRES, 10
 

You should modify this initial set of words and values to suit the subject of your mailing list, and your own policy. Some words are there to catch typical SPAM. Others are there to catch foul language. Depending on the nature of your mailing list, you might want to allow foul language, and if your list happens to be about gambling you would obviously delete the entry for the word "CASINO". Software developers might want to enter passwords, registration codes, etc. that some people on the list know, but others are expected to pay for. With ListFilter, you have a program running 24 hours per day, reading every message, ensuring that critical information does not get leaked, either accidentally or deliberately.

The moderation program checks every line in every message, looking for bad words. Bad words are matched without regard to upper/lower case. So, CASINO would match CASINO, Casino, casino, casinos etc. When a bad word on your list is 3 characters or less, it must match a complete word, not just a substring of a longer word. Otherwise short words might trigger too many spurious matches.

The score goes up with each occurrence of any bad word, but the penalty is reduced by 20% for a second occurrence of the same bad word, and another 20% for the third occurrence, and so on. Eventually the penalty (rounded to the nearest integer) will reach zero. This helps to reduce false alarms. It's more significant to see two different, equally bad, words appearing in a message, than it is to see two occurrences of the same bad word.

For example, 3 occurrences of CASINO would generate a score of 8 + 6 + 5 (rounded off to the nearest integer), rather than 8 + 8 + 8 as you might expect.

So, for example, a message that contained CASINO and OFFER EXPIRES and 1-800- and another CASINO would be scored as:

   
    8 + 10 + 12 + 6 = 36
 

If the threshold was 30, the message would not be approved. It would be forwarded to you for evaluation.

Note that this scoring system is more subtle than simply saying that any message containing CASINO must be blocked, or any message with an 800-number must be blocked. We'll soon see how even more subtlety can be added to the system. The goal is to create an artificially intelligent moderator that can make the right decision in almost all cases, while letting you have the final say about rejecting messages.

Good Words

In addition to a list of "bad" words, you also have a list of "good" words. These are on-topic words specific to the subject matter of your mailing list.

Suppose your mailing list was about baseball.

Your list of good words might contain:

 BASEBALL, 3
 INNING, 3
 STRIKE, 2
 BAT, 2
 WIN, 1
 LOSE, 1
 

Here you have a list of words that might come up in a discussion of baseball. Some words, like BASEBALL and INNING, are highly-specific to baseball and are unlikely to come up in other conversations. These words are given a score of 3. Other words, like STRIKE and BAT, are common in discussions of baseball, but could come up in other subjects. These words get a score of 2. Finally, words like WIN and LOSE are common to all sports and many other areas of life as well, so they are scored as just 1 point.

The moderation program will count up the occurrences of good words in each message. Each occurrence adds its score to the total. The score for a good word remains fixed, no matter how many times that word occurs in a message.

The program will compute the density of good word points, i.e. the total number of points divided by the total number of bytes in the message. An on-topic message should have a reasonable number of points for it's size. A long message with a very low point-count will be assessed a large penalty for being off-topic. This penalty might not be enough to block the message, but undesirable off-topic messages often incur other penalties. For example, they may be SPAM messages containing typical SPAM phrases. Note that almost all messages are assigned some small penalty for being off-topic, usually from 1 to 10 penalty points, which is unlikely to cause the message to be blocked, assuming a threshold of 30.

To get accurate off-topic penalties, it's recommended that you have 100 or more on-topic words. The system automatically increases the weight of the off-topic penalty as you add more good words. With a small number of words, the program can't confidently say that a message is truely off-topic, so the penalty will be small.

Bad People

There's a Bad Person list where you can enter the names or e-mail addresses of people who tend to cause trouble. Each person can be assigned a number according to how bad he is. This number is subtracted from the threshold value, so the person is more likely to be blocked. For example,

 JOHNSMITH@, 10
 JUNKYARDDOG@YAHOO.COM, 20
 SPAMMER@HOTMAIL, 250
 

The program will match these strings against the FROM: line in the e-mail message. If there's a match, the threshold will be lowered by the corresponding amount. The string you enter is normally the complete e-mail address, but could be just a fairly unique part of it.

In the example, if the threshold were normally 30, it would be lowered to just 30-10 or 20 for any messages posted by JOHNSMITH@... John Smith would have to stay under 20 points to avoid being blocked. SPAMMER would be blocked no matter what, since 250 is much greater than 30.

Good People

Similar to the Bad Person list, there's also a Good Person list. These are people you feel are unlikely to post a bad message (or people you are afraid to block, for political reasons!). By putting them on the Good Person list, you reduce the chance that one of their messages will be blocked. Feel free to put yourself on this list!

For example,

    NICEGUY@HOTMAIL.COM, 10
    THEBOSS@MYCOMPANY.COM, 50

Here you are increasing the threshold on all messages posted by these two people. NICEGUY is given an extra 10 points, so if the normal threshold is 30 for other people, he will only have to stay under 40.

Hourly Thresholds

The threshold value (the number of penalty points that will cause a message to be blocked) is usually at least 30. Whenever a message is blocked by the moderation program, it will be sent to you for a second opinion. If you happen to be away from your computer, perhaps at home, sleeping, or on vacation, there could be a huge delay before you can approve a message that was unfairly blocked. If you have an assistant, you could change the e-mail address where blocked messages are sent, and let your assistant take over. Another alternative is to vary the threshold value according to the time of day. During the hours that you are available to check your e-mail, you could have a strict threshold. During other hours, when you are usually away from your computer, you could have a looser threshold, say 50. That way you can reduce the chance that an acceptable message will be blocked and left in limbo for several hours.

For example, suppose you normally check your e-mail at 10:00am and 5pm each day. You might set up a schedule like this, where things get tighter as the time approaches for your e-mail check:

  12am: 50    1am: 50     2am: 50     3am: 50      4am: 50      5am: 45 
  
   6am: 40    7am: 35     8am: 30     9am: 30     10am: 50     11am: 50
  
  12pm: 45    1pm: 40     2pm: 35     3pm: 30      4pm: 30      5pm: 50
  
   6pm: 50    7pm: 50     8pm: 50     9pm: 50     10pm: 50     11pm: 50 
 

The times shown are the starting times for each one hour period. For example, 7am: 35, means set the threshold to 35 during the period 7am to 7:59am. Currently, if you have a much different schedule on the weekend vs. weekdays, you will have to change things on Friday afternoon and Monday morning. In future, we may have two different daily schedules that you can record.

General Settings

Under General Settings there are several miscellaneous values that you can set.

send approved messages to:

This appears for Topica lists only. It lets you specify the address where approved messages should be sent. The address includes a password that you set on Topica's Web site. You must set this address correctly for messages to get through to your subscribers.

send rejected messages to:

This is the address where rejected messages will be sent. This is normally your address, or the address of your assistant. Messages will come from moderator@ListFilter.com and will have a subject line something like:
Subject: BAD(45/30) <8 CASINO> <7 CASINO> <30 OffTopic>
indicating the overall score, the threshold, and a list of reasons for that score.

On MSN or Yahoo this address must be designated as one that has permission to approve or reject messages. You will approve or reject messages in the same way that you normally would on these systems. Instructions are included in the message.

On Topica, this can be any address, perhaps not even subscribed to the list, since to approve a message, you will forward it back to moderator@ListFilter.com, which is considered by Topica to be the sole moderator of your list. When the moderation program sees the "Fwd:" in the subject line, it will "rubber-stamp" the message, and send it to Topica, no matter what score it gets. To reject a message, just ignore, or delete it.

Threshold Reduction for New Posters:

This reduction in the threshold value is applied whenever a new person posts his first message. New people might be spammers, previously-bad people who have resubscribed under a new address, or people who simply don't know the etiquette of the list. It's a good idea to scrutinize them more carefully.

Max Score to allow New Poster into Database:

When a new person posts a reasonable message, well below the threshold, the program will add him to the database, and no longer apply the threshold reducton to his messages. This value is the maximum score allowed to let a new person into the database and remove him from "suspicion".

Max consecutive lines of quoted material:

Topica only: this lets you automatically chop long sections of quoted material. People are often lazy, making a one line reply and then quoting a 500-line message that people saw recently. The long section will be terminated at the value you set, and "<snip>" will be inserted at that point.

Max consecutive empty quoted lines:

Topica only: Sometimes there's a quoted section in a message that has a bunch of blank lines. This lets you tidy that up.

Max size of message with no penalty:

Messages up to this size (in bytes) will have no penalty assigned for being too long.

Number of bytes per penalty point in oversize messages:

Once a message exceeds the maximum size with no penalty, it starts getting penalized at the rate of one point per so-many bytes. Large messages may be painful for people to download if they have slow dial-up connections. e.g. You might specify that the penalty is 0 up to 5000 bytes, and then 1 point per 1000 bytes above 5000. Penalizing long messages will also help you to catch messages with large attachments. These could be worms or viruses. ListFilter also looks for any attachment with an apparent executable file type, such as .exe, .bat. .com, .scr, .pif, .lnk, .vbs etc.

Recent Messages

Each entry shows the poster's address, the subject, the score computed by the moderation program and the threshold. Detailed scoring information is in the form of a list of bracketed numbers with words after them, e.g.

   Mon  2:01pm JOHN SMITH <JOHNSMITH@YAHOO.COM>
               RE: WISH LIST
               SCORE:  8 <5 DAMN><3 OffTopic, 17 good / 779 bytes>
               THRESHOLD: 55 <35 time_of_day><20 Good Person>
 

This shows that the score is 8, which equals 5 for using the word DAMN, plus an off-topic rating of only 3 (good). The off-topic rating is based on 17 good word points in a message body of size 779 bytes.

The threshold value is based on 35 for this time of day, plus some extra leeway of 20 points that you've assigned to John Smith, since he is a better than average poster to this list.

Rejected messages will have a score that equals or exceeds the threshold. They will also have "*BAD*" beside them.

The moderation program checks for, and deletes, any messages that are duplicates of recent mesages. These messages have "*DUP*" beside them.

Topica only: rejected messages that you subsequently approve, by forwarding them back to moderator@ListFilter.com, have *OK* beside them. There is no 2-cent charge for these messages.

Poster Performance

The system keeps track of posters and their recent scores. e.g.
 Total Posted, Recent Average, Ads, Poster, Recent History
 
   39   3.5  ad  "JOHN SMITH"   (4, 5, 2, 8, 2, 2, 3, 7, 1, 1)
   11  11.3      "CARL JONES"   (13, 3, 4, 3, 31, 24, 17, 5, 4, 9)
 

This could be of use to you in choosing Bad People and Good People.

Topica only: There's an indication of which posters have advertising in their messages. ListFilter automatically snips typical Yahoo and HotMail ads at the end of messages.

Advertising Snipper (Topica only)

In addition to the typical Yahoo, Hotmail, and other ads that are handled automatically, you can enter a list of specific advertisements that you want snipped out. You have to give the first line of the ad, plus the maximum number of additional lines to snip. The system will stop snipping automatically when it comes to a very short line, but otherwise will snip until it reaches the maximum you have set.

Viewing Your Account Information

When you log in, you'll see a summary of the mailing lists that you own, along with a statement of the money that they've used, the payments that you've made, and your account balance.

There are links for configuring your lists, adding funds to your account, changing your password, and logging off.

Adding Funds to Your Acount

You can add funds to your account, quickly and securely, via DigiBuy's credit card processing Web site. Just click the link after logging in to your account. Any multiple of $25 can be added. Be sure to enter your user id on the second page at DigiBuy, so we can credit the correct account.

Changing Your Password

You should change your password from time to time if you are worried that someone might get in and mess up your settings.

If you forget your password, contact Rapid Deployment Software.

Logging Off

When you log off, the system forgets who you are (destroys your "cookie"), so that you will have to login again. This is useful in case you want to leave your computer unattended for a while, and not allow anyone to tamper with your settings. The system will also forget who you are whenever you close your Web browser.

Frequently Asked Questions

 Q: I subscribe to a mailing list. Can I use ListFilter?
 
 A: ListFilter can only be used by someone who is the owner,
    manager, or moderator, i.e. someone who's "in charge" of a
    mailing list. The average subscriber does not have permission
    to decide how a list should be moderated, although he/she could
    recommend ListFilter to someone in charge.
 
 Q: Messages are taking a long time to get through. Is this the fault
    of my mailing list service, or is it a problem with the moderation system?
 
 A: Messages flow from the poster to the list service, then to the
    moderation program, then back to the list service where they appear
    on the message board on the Web. Finally, the list service distributes
    the message via e-mail to those who have chosen that option.
 
    Log in to ListFilter and check the Recent Messages.
    The messages listed here are ones that have already been moderated and
    either sent back to your list service for distribution, or rejected
    and sent to you for a second opinion.
    
    The moderation system usually processes incoming messages within 
    a few minutes of receiving them. Long delays in message traffic 
    are generally due to your list service having built up a huge backlog 
    of messages to send out. This backlog usually affects all lists
    and is sorted out by the list service within a few hours.
 
 Q: Will the poster of a bad message be informed when his message is
    blocked?
    
 A: On Yahoo and MSN (but not Topica) you can set an option to 
    automatically inform people. This option is a feature of the list service, 
    not ListFilter.
    
    Blocking a message can be a very subjective decision on your part,
    and very few posters will agree that their message should be blocked. 
    By telling the poster why he was blocked, you could trigger a nasty 
    and time-wasting exchange of e-mails. It may be better to simply tell 
    your subscribers that your policy is to block off-topic posts without 
    informing anyone, and that you don't guarantee that this will always 
    be done fairly or consistently. In many cases the poster won't even notice
    that his message was blocked, and will write it off as slowness or 
    unreliability in the system.
 
 Q: Is it better to unsubscribe a bad person, or just put him on the
    Bad Person list with a very high score?
    
 A: If you get a really bad person or spammer, it may be better 
    to give him an impossibly high score on the Bad Person list.
    This way, he won't get any feedback and won't
    know the best way (for him) to proceed. For instance he may waste
    his time trying to post again from the same address, rather than
    resubscribing with a new address. He may also conclude that there's
    a human carefully reading all messages, so there's no chance to ever 
    get his bad messages through.
 
 Q: Should I let people know that I am using an automatic system for
    moderation?
  
 A: It will be better if you don't tell too many subscribers that you are
    using an automatic moderation system. If a trouble-maker knows
    that a program is doing the moderating, he could start playing games
    to defeat the system. You can ultimately win these games, but it
    will be a waste of time.
 
 Q: I can ban someone by giving his e-mail address a high score on 
    the Bad Person list, but he can easily sign up for a new e-mail address 
    on Yahoo, HotMail or other free service and then resubscribe. 
    What can I do?
    
 A: If you have a trouble-maker who subscribes under many different
    addresses, you are still protected by the Bad Words list
    and other tools. To catch more of his messages, you can add to the
    Bad Word list any unusual words that he likes to use, or any words 
    that he frequently misspells. Also keep in mind that new people 
    (i.e. new addresses) have a lower threshold the first time 
    they post, so you may catch him that way.
 
 Q: What can I do about people who are on-topic, but post too many
    messages, and are frequently boring or annoying?
 
 A: The system is set up to delay people who post a large percentage
    of messages over a relatively short period of time. The amount
    of delay depends on whether it's a "good" person, a "bad" person,
    or someone else. The delay also depends on the score given to
    the message. Delays can be anywhere from about 5 minutes to 
    about 30 minutes.