Skip to main content

Is machine learning going to hack it in security?

(Image credit: Image Credit: Geralt / Pixabay)

There are very few concepts in security that inspire as much debate as machine learning. Pieces of technology that are capable of learning, dissecting, and analyzing data patterns at rates human analysts can only dream of. But not everyone is convinced. Advocates of machine learning think the technology is a revolutionary method of fighting security threats at an unprecedented rate; even stopping attackers in their tracks before they’ve managed to attack a network. The more sceptical, or some would say, rational elements of the security community see it as a useful tool, but one that is not a solve-all for the industry’s woes.    

Despite being a relatively new concept, the sceptics may have a point. If machine learning was as powerful a tool for combating security threats as proponents suggest, surely it would have already made a dent in the security issues organizations and individuals face. Yet, cybercrime has been rising globally by nearly every metric – the severity, sophistication, and sheer volume of security incidents have all increased. So how can we try to understand if machine learning represents a genuine watershed moment for security, or ends up being ‘just another tool’?   

The Potential

There are few who doubt the potential that a program capable of learning can have on detecting and stopping malicious activity. Machine learning can, for example, address challenges such as intrusion detection, by being programmed to study time stamps, IP addresses, and connection IDs to assign labels to network activity: ‘usual’ or ‘unusual.’ 

Insider threats can be dealt with in much the same manner. The technology can be programed to study individuals within a network who are downloading or printing excessive amounts of data, as well as logging on via remote connections outside of work hours, and identifying them as potentially dangerous activities. Of course, this does present the chance of false positives creeping in, as some people printing, downloading, or accessing a network via a VPN could be perfectly legitimate users, working diligently out of hours or completing a legitimate but anomalous task. Put simply, machine learning can only detect unusual behaviour, not outright malicious behaviour.

These are just two examples of the application of machine learning in the security space, but machine learning models are used everyday for malware analysis, data manipulation, and spam filtering, to name but a few. 

The Problem 

Machine learning, like any technology in its embryonic stage, has teething problems. Part of the machine learning “process” is labelling data, to provide “ground truth.” Sometimes, however, this cannot be done automatically – meaning data has to be manually labelled by a member of staff in order to train the machine learning system to engage with the data set and work effectively.

This therefore means the program is only as good as the human who entered the data, and it’s more than likely there will be some degree of human error in the baseline model. Relying entirely on machine learning can be dangerous when models have high performance accuracy but inaccurate data sets with which to work.

Machine learning and artificial intelligence were initially envisioned and have shown encouraging results when they are allowed to do what they were designed to do. In the world of security however, there’s the challenge of “adversarial machine learning,” where a model is actively attacked in order to undermine its capabilities. Machine learning models can be tricked into making incorrect decisions. This is done by inserting malicious data into the learning process, allowing attackers to edit, confuse, and disrupt the model’s decision-making process – and it’s often not too difficult.

Machine learning models can also be reverse-engineered. Researchers demonstrated in 2016 that they could replicate an Amazon ML model with near-perfect accuracy, simply by logging its responses to a few thousand queries. This has obvious implications for how attackers can expose training data and threat information simply by probing publicly available algorithms, and then design attacks so that they appear normal to the ML-driven analysis.

Furthermore, machine learning is likely to suffer from one of the problems for which it has been lauded as a potential solution: the omnipresent cybersecurity skills gap. As discussed, for machine learning systems to be effective, they need to be carefully programmed by skilled security professionals. But, As cybersecurity becomes more and more necessary, there are fewer and fewer people with whom the burgeoning cybersecurity industry can be staffed, therefore fewer people to set up ML systems. According to IDG, the number of IT staff in a hiring position who reported issues rose from 23% in 2014 to 51% in 2018. Likewise, Cybersecurity ventures have suggested that by 2021 there will be 3.5 million unfilled cybersecurity positions.

While much has been said about the transformative potential of machine learning to overhaul the current cybersecurity workforce, and plug the skills gap once and for all, it is not yet a viable option, for all the reasons mentioned above. Machine learning programs, while incredibly powerful tools, remain just that – tools. Any attempt to replace nuanced human thinking and behavior with machine learning technology is surely destined for failure. If an employee for example is working diligently as mentioned above, but happens to be doing so in an anomalous way, only a human actor with the depth of understanding to draw this conclusion from the data can see this. As a method for filtering the plethora of alerts security teams are subjected to daily, machine learning’s potential is both exciting and genuine. Regardless, there still needs to be a human involved to separate behaviour that is simply unusual from behavior that is dangerous. After all, not all anomalies are malicious, and not all threats appears anomalous.   

Moving forward 

Machine learning should be treated with a mixture of scepticism and optimism, as all new technologies should be. The ability for well-programed machine learning tools to relieve security professionals of some of the more mundane aspects of their work is huge. Machine learning, however, is still only one (albeit impressive) piece of the complex puzzle that is keeping us safe online.

Giovanni Vigna, CTO and Co-Founder of Lastline 

Image Credit: Geralt / Pixabay

Giovanni Vigna
Professor Giovanni Vigna is CTO and co-founder of Lastline, and a Professor of Computer Science at the University of California in Santa Barbara.