Saturday, June 12, 2010

The trouble with greylisting

Greylisting is one of several fairly common methods of preventing bulk spam from getting into a mail server. In short, the concept is based on the following idea: The receiving mail server is contacted by a sending server it has never seen before. Rather than accept the (possibly spam) message, it issues a message to this effect:
Dear sending mail server: I'm having a problem right now, and can't accept your message. Please try again later.
The thought is that, if it is really serious about delivering it, it will try again in a little while. Most bulk spam mail servers are not configured to retry. as they expect that most of the harvested addresses they attempt to deliver to are going to fail for one reason or another. A real mail server, however, will try back after a few minutes. At that time, the greylisting server will (in theory) recognize the retry attempt, accept the message, and make a note never to test that host with this rather rude procedure again.

There are at least a few problems with this method, that I have seen.



1. Mail being delivered by a cluster having multiple IPs.

These days, the large e-mail providers (Hotmail/MSN, Yahoo, etc.) use multiple IP addresses to deliver e-mail, and the source IP address can vary on the next delivery attempt. In this case, the greylisting host will not recognize it as being a retry, and will "test" that server as well. In the best case scenario, this repeats until all of the possible mail host IPs have been tested & stored, one of the earlier IPs comes around again, and the message is finally accepted/delivered (after a long delay). However, this can also result in the sending host interpreting this strange charade as a permanent problem with the receiving mail server, and giving up. In this case, the sender would receive a bounce message or NDR (non-delivery report).

Arbitrary minimum retry times

To get around the problem of an immediate retry, which is not that expensive to a spam host, most greylisters also implement a minimum retry delay, which will continue to reject reattempts within a certain time frame (usually around 5 minutes). This time frame may be unacceptable to some hosts, and unknown to others, again possibly causing them to give up because they are generally confused about what's going on.

Record lifetimes

The stored info about confirmed hosts usually has a lifetime before a server will need to be re-tested. This causes a delay to occur again in the future, and of course at that time the possibility exists that the process will fail for one of the above reasons, causing everyone to scratch their head.

As the volume of mail on the internet increases, there will be more providers with clusters doing delivery, there will be more spam, there will be more people using techniques such as greylisting, and there will be more spammers finding ways to reduce the effectiveness of greylisting.

Conclusion

Greylisting has got to go. It's a stopgap measure that is based on the idea of fooling someone or something. Those kinds of solutions usually don't scale, and eventually fail.

What is the future of mail host authenticity checking?

I haven't researched this much, but why doesn't every "valid" mail host in the world have a public key listed in a worldwide registry database or available via DNS? There is already precedence for databases on the internet as being part of the infrastructure - such as the root.hints file for DNS, and arguably what people are already doing with RBL services such as spamhaus, cbl, etc.

Here's an example of how this would work:

Mail Host A contacts Mail Host B and tries to deliver a message in a signed"envelope", using his private key.

Mail Host B obtains the public key of Mail Host A (if it's not cached), probably via DNS protocol

Mail Host B verifies the authenticity of the signing against the public key

Mail Host B knows whether Mail Host A really is who he says he is, and perhaps even whether he is worth listening to.

I do realize that this is similar to SPF (Sender Policy Framework), but the thought of using GPG signing seems like a better way to do this. It would get around some of the inherent vulnerabilities and non-portability of depending on identifying certain mail server IP addresses. The signature that Mail Host A uses is totally independent of the IP address being used to deliver the message. As long as the private key is not compromised, the mail envelope can be trusted.
Post a Comment