• RSS
  • Twitter
  • FaceBook

The Next Step in the Spam Control War: Grey Listing (Part 2)

Although not very recent, this is an informative paper that proposes a very effective method of enhancing the abilities of mail systems to limit the amount of spam that they receive and deliver to their users. Greylisting is being implemented more frequently these days.

If you missed the previous article please read The Next Step in the Spam Control War: Grey Listing (Part 1).

Copyright © 2003-2004. Permission to reprint and translate is granted provided this copyright notice is kept intact.

Suggestions for more effective protection of email domains

Greylisting will not be nearly as effective against spam unless ALL of the MX hosts for a particular domain use mail software that incorporates it.

A fair number of spamming software packages are already smart enough to retry delivery to other MX hosts for a domain if delivery through one MX fails. Since presumably all MX hosts will be whitelisted for each other (what is the point to delaying acceptance of email from a host that you know is a real MTA that will retry?) if the spammers can deliver to one of the MX's without a delay, then you have no more protection than you did before.

In addition, Greylisting, while already having a fairly minimal negative impact, can be made less intrusive if all of the MX hosts use a common database for tracking delivery attempts. To illustrate this, lets take an example where we have several hosts listed as mail exchangers for a domain, with seperate Greylisting databases.

A legitimate sending relay with a retry time of an hour attempts to deliver to one of the listed MX hosts. This host has never seen this triplet before, and so it generates a record in its own Greylisting database for the triplet, and refuses to accept the mail. An hour passes, and the sending MTA knows that the last attempt to deliver failed, so it decides not to retry delivery to the same MX host, and so it picks a different one and tries to deliver to it. This new MX host it picked is using a seperate database, and it does not know about the past attempt, and since it has not seen the triplet, it generates a new record in its own database for it, and refuses delivery again.

From this example, it can be seen fairly easily that there is the possibility that the delay in delivery of a legitimate piece of mail may get significantly longer than expected if there are enough MX hosts in the mix, even to the point that the sending server may give up and bounce the mail.

To avoid this possible problem, it is STRONGLY suggested that when there is a case of multiple MX hosts for a domain, they should all use a common database for tracking the mail triplets. There may be cases when the MX hosts are too widely seperated (network-wise) to be able to do this efficiently and robustly, but even in those cases it is possible that Greylisting will still be useful enough that this example worst-case scenario can tolerated or worked around to minimize the impact.

Common spammer attack methods

This section details a few of the most prevalent spammer attack methods that were observed during the testing period, and how the Greylisting system deals with them.

Method 1: The non-primary MX attack

A significant number of spam emails specifically target non-primary MX hosts for domains, for the simple reason that backup MX servers will usually accept and relay all of the spam to the primary MX host without checking it, which reduces the load on the spammer's system, requires little or no additional processing for mails that are rejected, and usually results in faster delivery transactions because the receiving system has to do less work (in the short term while the attack is occuring).

Greylisting handles this attack very well, since the whole point of the attack is to minimize bounces and delivery delays.

Method 2: The spam troll/Dictionary attack

Many spammers are now resorting to "trolling", that is, sending spams to common usernames (tom@, harry@) at domains (also known as a dictionary attack), or sending to generated usernames made from real names harvested from other sources. They usually seem to be operating from a dictionary of common user names, but the "generated" usernames tactic may be getting more common.

The spammers probably use this method in order to reach people who have either taken steps to try to keep their email address from being harvestable from the web, or who are fairly novice users that may not have the resources or inclination to create their own web pages. Probably the latter case, since novice users are probably more likely to purchase something that has been advertised through spam.

This type of attack is very often combined with the non-primary MX attack, since most of these emails will result in bounces on domains that don't have a fairly large user population. Consequently, the spammers target the backup MX hosts. That way, they don't have to handle all the bounces and failures that these messages generate.

Greylisting handles these very well, since they almost always come from random short-lived dynamic IP addresses. And because most of these emails will ultimately generate bounces, it is costly for spammers to attempt redelivery of this type of attack. Also, since this attack is so distinctive (A high number of bounces generated in a short period of time from a particular IP address or set of addresses), it should be very easy to recognize and add to other blacklisting methods if given enough time to do so, which Greylisting provides.

Method 3: The organized distributed attack

Many spammer attacks seem to come in a pattern that looks very much like a moderated DDOS (Distributed Denial Of Service), lets call this type of spamming an "Organized Distributed Spammer Attack" (ODSA).

On the systems where spammer methods were evaluated, it was observed to be fairly common that there were spam delivery attempts that happened in a fairly short window of time, where the SMTP connections were originating from many different and seemingly unrelated IP addresses. Yet all of the envelope sender addresses were the same or similar, and the envelope recipient addresses were fairly sequential.

Obviously, Greylisting (as defined here) currently handles these attacks extremely well. However, if (when) the spammers adapt and learn to retry the delivery attempts, it may not be as effective by itself.

That being said, it is quite possible to adapt the Greylisting method to help thwart the described workaround. For example, at the cost of a little additional processing, it should be fairly simple to look at delivery attempts that have happened in a fairly recent time period, and after the first few attempts have been seen, submit all of the relays exhibiting this behavior to various blacklists as probable spam sites.

Method 4: The web proxy attack

A significant portion of spam seems to come from relays that appear to be CacheFlow Server or other types of proxies. These can usually be identified by returning "CacheFlowServer" to an ident probe.

Greylisting will block these particular attacks completely, since those servers are not "real" MTA's, and will never retry.

Possible methods of spammer adaptation

Greylisting as proposed is fairly immune to possible routes of adaptation by spammers to get around the blocking. The possible methods of adaptation may make Greylisting by itself less effective, but the ways of getting around it will only make other spamblocking methods more effective.

The normal spammer behavior is to change IP's when normal IP blacklists have listed their current IP. Unfortunately for the spammers, changing their IP does not help with our delaying method, as every mail (and its delay) is tied to the IP address of the sending relay. If the IP address changes, it effectively "resets" the timer on the delay, even if the envelope sender and recipient addresses stay exactly the same.

The other adaptation that is expected will result in the current versions of client spam software becoming obsolete, since most of those spamming applications are not intelligent enough to retry a delivery after getting any type of error. Spammers will be required to either use more intelligent software that retries, or to relay through smart relays.

We may see spammers gravitate toward using open third party relays, but most of them are already locked down or are quickly becoming so. Or, they may setup their own relays. In either case, it does nothing to negate the likelihood that those relays are or will quickly become listed in blacklists, thereby reducing their effectiveness for sending spam.

If spammers setup their own relays, the fact that email transmissions are delayed and that they may each take several attempts to deliver, only increases the storage and bandwidth requirements on the spammers side, which also raises the costs to the spammer. And if we can make it less profitable, then we are well on the way to solving the spam problem.

Implementation Caveats

The delaying tactic that is the core of Greylisting may cause undesired delays if the host it is running on allows clients that will be using regularly changing IP's to relay mail through it. For example, if clients on non-local networks are allowed to relay through the server after doing a POP or IMAP auth, this implementation does not handle allowing these clients to deliver their mail for forwarding without incurring a probably undesired delay.

Workarounds for this issue exist, but are not implemented in the example code. Essentially all that is necessary to allow this without incurring a delay penalty is to simply insert a short-lived record into the Greylisting database at the same time that authorized relaying is enabled, which allows that originating IP address the ability to send mail for some small but sufficient amount of time.

Reception of mails from legitimate hosts that either do not pay attention to the temporary failure nature of the rejections, or never attempt any retries will be adversely affected by this system. Hopefully, any mailers that have these problems will be quickly fixed once Greylisting has been implemented at a significant number of sites.

Unfortunately, a few isolated systems with these issues have been discovered during testing. The affected systems either do a poor job of following the SMTP spec, or are outright violating it. Since SMTP is by nature an unreliable transport method, systems that do not retry deliveries are poorly advised and need to be fixed.

An SMTP session log generated by one specific example of a non-compliant MTA follows:

-> HELO somedomain.com
<- 250 Hello 
-> MAIL FROM: <sender@somedomain.com>
 
<- 250 2.1.0 Sender ok
-> RCPT TO: <recipient@otherdomain.com>
<- 451 4.7.1 Please try again later
-> DATA
<- 551 No valid recipients

From this, it is fairly obvious that the sending MTA did not check the status from the RCPT command, and continued on to issue DATA, which caused a permanent failure code to be issued, which is not a valid step when no recipients addresses have been accepted. In the case of this particular mailer, it did pay attention to the later 551 error code, which is considered a "permanent" failure code. This caused the message to be bounced back to the sender. But that is incorrect behavior because it failed to observe the earlier "temporary" failure and abort the transaction at that point.

An Example Implementation

The provided example implementation (available here) is a Perl-based milter for Sendmail, using version 0.18 of the Sendmail::Milter interface (also available from CPAN) and has been tested with Sendmail 8.12.9, though it should work with all versions of sendmail after 8.12.5. Sendmail::Milter requires a threaded perl installation and was tested with perl 5.8.0 (available from perl.org or from CPAN).

Also available are database definitions used for this implementation, and a sample configuration file. Since the implementation is in perl, it is easily modifiable. Not available on CPAN (yet...).

The database used was Mysql 3.23.54, though it should work with any later version, and most likely will work with earlier versions as well. In addition, the test systems were also using amavisd-new with the amavisd-new-milter interface, which was configured to do additional spamblocking with the help of Spamassassin 2.53.

In the interests of keeping the example implementation simple and easy to understand, some features that could easily be optimized have been left in their unoptimized state. Even so, during testing under heavy spam loads, the added time for the checks was unnoticeable in most cases, and in the remaining cases, the cause was due to network delays accessing the database (which was remotely hosted).

One detail of the implementation will probably strike horror in the hearts of diehard "structured" programmers. In several places, goto is used. Because of the way that the milter interface works, this seemed more straightforward than other methods.

Other details on the example implementation

Successful mails that have an envelope sender of the null sender are considered a special case where we will expire the record immediately in order to avoid whitelisting it, once we allow the mail to go through. Mails from the null sender are (according to RFC 821) only to be used for special administrative mails like bounces. Consequently, they are almost never used for more than one legitimate email. For that reason, there is no need to maintain them any longer once an email has been passed.

Unfortunately, many spammers are misusing this sender address because it generally won't generate a bounce from the recipient server (there's no point in generating a bounce message for a mail that is already a bounce). Expiring these records immediately helps limit the possibility that spammers using this sender address incorrectly can send multiple spams to the same recipient in a small time frame.

In addition, there are several other small features incorporated into the example implementation that are not part of the Greylisting system itself, but are attempts at enhancing or refining the general purpose of spam blocking.

The database layout used is not normalized. This was a conscious choice so that people who may not be that familiar with database design could more easily understand it. However, reworking the database implementation to normalize it should be fairly trivial.

One thing that is not incorporated is any kind of database maintenance. There is no provided method of inserting manual whitelisting entries other than the example sql statements in the above dbdef.sql file. I expect that eventually a nice web cgi for maintaining the database will be written, but haven't had time to create one yet. Or maybe someone will create one and share it.

Reference Implementation Source

Links to other implementations and information

Credits

If you are interested in the paper in its original first published form, the original can be found here.

Revised: 2003-08-21

If you missed the previous article please read The Next Step in the Spam Control War: Grey Listing (Part 1).

Receive all the latest articles by email!

Get all articles delivered directly to your mailbox as and when they are released on WindowSecurity.com! Choose between receiving instant updates with the Real-Time Article Update, or a monthly summary with the Monthly Article Update. Sign up to the WindowSecurity.com Monthly Newsletter, written by George Chetcuti, BSc in Computing & IS (Honors), containing news, the hottest tips, security links of the month and much more. Subscribe today and don't miss a thing!



Receive all the latest articles by email!

Receive Real-Time & Monthly WindowSecurity.com article updates in your mailbox. Enter your email below!
Click for Real-Time sample & Monthly sample

Become a WindowSecurity.com member!

Discuss your security issues with thousands of other network security experts. Click here to join!

Community Area

Log in | Register

Readers' Choice

Which is your preferred Event Log Monitoring solution?