Skip to main content

The trouble with greylisting

Greylisting is one of several fairly common methods of preventing bulk spam from getting into a mail server. In short, the concept is based on the following idea: The receiving mail server is contacted by a sending server it has never seen before. Rather than accept the (possibly spam) message, it issues a message to this effect:
Dear sending mail server: I'm having a problem right now, and can't accept your message. Please try again later.
The thought is that, if it is really serious about delivering it, it will try again in a little while. Most bulk spam mail servers are not configured to retry. as they expect that most of the harvested addresses they attempt to deliver to are going to fail for one reason or another. A real mail server, however, will try back after a few minutes. At that time, the greylisting server will (in theory) recognize the retry attempt, accept the message, and make a note never to test that host with this rather rude procedure again.

There are at least a few problems with this method, that I have seen.



1. Mail being delivered by a cluster having multiple IPs.

These days, the large e-mail providers (Hotmail/MSN, Yahoo, etc.) use multiple IP addresses to deliver e-mail, and the source IP address can vary on the next delivery attempt. In this case, the greylisting host will not recognize it as being a retry, and will "test" that server as well. In the best case scenario, this repeats until all of the possible mail host IPs have been tested & stored, one of the earlier IPs comes around again, and the message is finally accepted/delivered (after a long delay). However, this can also result in the sending host interpreting this strange charade as a permanent problem with the receiving mail server, and giving up. In this case, the sender would receive a bounce message or NDR (non-delivery report).

Arbitrary minimum retry times

To get around the problem of an immediate retry, which is not that expensive to a spam host, most greylisters also implement a minimum retry delay, which will continue to reject reattempts within a certain time frame (usually around 5 minutes). This time frame may be unacceptable to some hosts, and unknown to others, again possibly causing them to give up because they are generally confused about what's going on.

Record lifetimes

The stored info about confirmed hosts usually has a lifetime before a server will need to be re-tested. This causes a delay to occur again in the future, and of course at that time the possibility exists that the process will fail for one of the above reasons, causing everyone to scratch their head.

As the volume of mail on the internet increases, there will be more providers with clusters doing delivery, there will be more spam, there will be more people using techniques such as greylisting, and there will be more spammers finding ways to reduce the effectiveness of greylisting.

Conclusion

Greylisting has got to go. It's a stopgap measure that is based on the idea of fooling someone or something. Those kinds of solutions usually don't scale, and eventually fail.

What is the future of mail host authenticity checking?

I haven't researched this much, but why doesn't every "valid" mail host in the world have a public key listed in a worldwide registry database or available via DNS? There is already precedence for databases on the internet as being part of the infrastructure - such as the root.hints file for DNS, and arguably what people are already doing with RBL services such as spamhaus, cbl, etc.

Here's an example of how this would work:

Mail Host A contacts Mail Host B and tries to deliver a message in a signed"envelope", using his private key.

Mail Host B obtains the public key of Mail Host A (if it's not cached), probably via DNS protocol

Mail Host B verifies the authenticity of the signing against the public key

Mail Host B knows whether Mail Host A really is who he says he is, and perhaps even whether he is worth listening to.

I do realize that this is similar to SPF (Sender Policy Framework), but the thought of using GPG signing seems like a better way to do this. It would get around some of the inherent vulnerabilities and non-portability of depending on identifying certain mail server IP addresses. The signature that Mail Host A uses is totally independent of the IP address being used to deliver the message. As long as the private key is not compromised, the mail envelope can be trusted.

Comments

Popular posts from this blog

Reaper, Linux, and the Behringer X-Air - Complete Studio Solution, Part 1

Introduction and Rationale This is part one of a major effort to document my experiences with recreating my home studio, entirely using Linux.  Without getting into too many of the specifics, a few months ago I decided that I was unhappy with Windows' shenanigans - to the point that I was ready to make a serious attempt to leave it behind.  For most in this situation, the obvious choice is to switch to Mac OS.  With its proven track record, support, and options for multimedia production, it is naturally the first alternative to consider if your goal is to simply use something other than Windows. For me the choice was not so simple. I despise Mac OS and, in general, the goals and philosophies put forth by Apple in an effort to ostensibly provide users with an "easy" working environment.  It does not help that I have also failed to find any aspect of the Mac OS UI intuitive, but I realize that this is a subjective matter. With my IT background and user-control* favori

An Alternative Take on AI Doom and Gloom

 I've purposely held my tongue until now on commenting about "AI" (or, more specifically as has come to be known, GAN or Generative Adversarial Networks).  It seems like it is very in-style to complain about how it has made a real mess of things, it is displacing jobs, the product it creates lacks soul, it's going to get smart and kill us all, etc. etc.  But I'm not here to do any of that. Rather I am going to remind everyone of how amazing a phenomenon it is to watch a disruptive technology becoming democratized From the time of its (seeming) introduction to the public at large, around November of 2022, to late 2023, the growth and adoption rate has been nothing short of explosive. It features the fastest adoption rate of any new technology ever, by a broad margin.  To give a reference, the adoption rate for AI image and text generation, real-world uses, in just 12 months is comparable to all of that of the another disruptive technology, the World Wide Web, takin

RANT TIME: Why do replies to a message I sent go to my spam folder?

Despite what one would think/hope, sending a message to a given address does not inherently give Google a high confidence that a reply from this address is expected (and, for example, that it should bypass spam checks). I have confirmed with Google's tech support that there is no way to automatically have this happen. The user can do the following: 1. Add the address to your contacts list in Gmail. 2. Check spam folder for replies, and mark it as "not spam" if something ends up there, which should influence the fate of future replies received. I can also approve an address at the domain level, i.e. if it is a big vendor or similar. I've had to do this with several of our Chinese vendors. I regularly ask engineering and purchasing to give me a list of the supplies we deal with, so I can approve them as a preventative measure. For what it's worth, all of the false positive instances of reply -> spam we have experienced have involved the sender's email server