More than 40 years ago, on May 3, 1978, a computer vendor in the USA sent the first spam email in history advertising a newly launched computer that became a big success.
Since its first occurrence, spam has changed a lot. For decades spam could be easily recognised by its poor design, clumsy sales pitch and numerous spelling mistakes. But today, spam mails are professionally designed and cover a wide range of topics. Spam senders are increasingly picking up on trends such as the emergence of crypto currencies and messages that are intended to intimidate, frighten or appeal to the recipient’s greed, desperation or just curiosity.
Nowadays however the vast majority of spam emails have far less chance of making it into an email user’s inbox because spam filters are constantly evolving. In their simplest form, they work as follows: simple rules filter out messages with suspect words such as ‘online pharmacy’, ‘Viagra’ or ‘Lottery Win’ that come from unknown or blacklisted IP addresses. But spammers can quickly update their messages to work around these barriers. By just adjusting the spelling of a word, they can outwit these simple filter rules. Depending on the font used, the difference between a small "L" and a large "I" and a "1" can hardly be recognised. From the word "Viagra" you only have to make "V1agra" and the word is no longer recognised by the algorithm. To make the spam filters recognise this unwanted message correctly, a new rule must be added to the filter system – and this has to be done for each new filter evasion that the spammer comes up with. Nowadays, the analysis of individual words alone is no longer sufficient for reliable spam detection.
And this is where Machine Learning (ML), a branch of AI, comes into play: It allows computers to process data and learn for themselves without being manually programmed. An ML-based spam filter can learn in several ways, but you have to train it. This can be done, for example, by using a large amount of data from already recognised spam mails. These are examined by ML for patterns that occur repeatedly and are highly likely to be an indicator of spam. The ML algorithm then automatically creates a new rule for the spam filter.
How human intelligence spots unusual email communications
An experienced email security expert can assess the individual potential of spam emails much more comprehensively than a machine to determine whether or not there is a genuine danger by identifying the possible ‘value chain’ – that is, how spam ultimately gets converted into cash. The spammer has one aim and one aim only – to get paid. The expert is able to ask: “What happens if a link in a phishing email is clicked?”, “How will the online fraudster get his money in real life?”, “What banking method will they use?” There is a lot of experience and some very specific expertise involved in thinking this through which is currently only possessed by humans.
In addition to the anti-spam specialists, there is a second human factor in the evaluation of spam: the user. From the user point of view, spam can be classified into three categories. First there is black spam. This is spam which is either not accepted by the provider’s email servers (because it is delivered by servers on blacklists) or can be detected as unwanted spam by spam filters, e.g. illegal advertising. Secondly there is red spam, which contains malicious links (e.g. phishing) or even malware. For both categories, the recognition rate is very good across all major email providers, so that users hardly ever see these emails.
Then there is a third category: “Graymail”. Users currently have an edge over machines when assessing this third category. Called “Gray” (or “Grey”) because it is neither on the black list of blocked senders or on the user’s white list of approved senders, this is email that your spam filter isn’t quite sure what to do with until it’s learned a bit more about it, because some users mark it as spam and others don’t. Emails from retailers, for example. The recipient technically opted in to receive those emails by ‘engaging’ with them when he made a purchase, but after that he doesn’t really want them to keep bothering him and always moves their emails to the ‘Junk’ folder and maybe ‘block the sender’ as well. Over time, the spam filter will learn what the recipient considers to be “graymail” based on these actions as well as by the actions of all other recipients of emails sent from that particular domain name. AI may in the future be able to adjust and improve its reaction to this sort of spam proactively, based on such continuous feedback.
Man AND Machine
AI accelerates spam detection and at the same time increases the hit rate because it evaluates huge amounts of data almost in real time. As mentioned before, it is based on machine learning that relies on algorithms to learn from experience. There is further potential on offer from Deep Learning, a sub-discipline of Machine Learning that uses artificial neural networks built like the human brain. They can be trained in such a way that they independently recognise patterns in the input data and learn from mistakes. However, there are limits and these are where human expertise, strategic and creative thinking are indispensable. In addition, hackers are becoming increasingly sophisticated in overcoming defence systems which makes it more difficult to defend against attacks, especially since the attackers also use AI.
Whereas AI is sometimes seen as a threat to human autonomy, humans and machines should be viewed in the context of enhancing each other’s strengths: ‘humans plus machines’, not ‘humans versus machines’. This hybrid intelligence based on human values is the best way to increase AI adoption and to boost productivity.
Jan Oetjen, Mail and Portal, GMX
Image Credit: PHOTOCREO Michal Bednarek / Shutterstock