Following decades of digitisation, businesses are currently facing an increasingly wild, lawless expanse of internet space that sits beyond the frontier of corporate firewalls, proxies, and other traditional cyberdefences. An organisation’s presence on the internet can be viewed as it’s “attack surface”. It exists outside the domain of network security and has become incredibly large and complex for security teams to manage. As the internet expands, cybercriminals are not hesitating to evolve their approaches alongside it to exploit brands, consumers, and employees with relative impunity.
In their bid to continuously stay at least one step ahead of business, automation is becoming the weapon of choice for threat groups. An example of the use of automation in the struggle between security teams and threat groups is when the latter started automating the production of randomly generated malware that looks different every time it is deployed.
Another example is the development of “fingerprinting” techniques that can determine if a user visiting one of their infected websites is a potential victim, a researcher, or a bot deployed by researchers. Some fingerprinting techniques identify the browser or IP address from which a user interacts with a site, others are time-based and can tell how quickly things on a web page execute. This is significant as a bot may move far faster than a typical user and a researcher analysing the page considerably slower. If anything does arouse suspicion, the firing of malware can be prevented, and the threat actors make a successful getaway.
Another example of intelligence from threat actors belongs to the credit-card skimming cybercrime gang known as “Magecart” and used successfully against British Airways and other high-profile brands. The group was able to automatically check web pages for specific keywords to ensure that its skimmers fired on the checkout page, the only page where the skimmer could perform its function properly, and where the airline’s customers unknowingly handed over their valuable credit card information to them.
With more automated threat infrastructure spun up every day and obfuscation techniques evolving, security teams need to respond by stepping up to the challenge, and use automation and machine learning in innovative ways that catch cybercriminals off-guard. The key to this counterstrike will be to properly utilise data, and as detecting threats on the internet is about as easy as finding needles in a huge haystack, it will require intelligent machine learning models that can identify and mitigate threats before they strike.
Luckily, efforts are already underway to collect as much data as possible about the internet as a whole. As it is collected, machine-learning models are able to analyse the data to give a picture of the internet not as the infinite, chaotic place it appears to be to humans but the tidy graph of highly connected data points that it is. Over time threat actors will have nowhere to hide.
The modern cyberwarrior
While machine learning models are broad, fast, and tireless, humans are inevitably needed to write the rules for them. This starts with a simple ‘decision tree’, but soon expands as more and more attacks are analysed, laying the ground-work for more nuanced threat detection. Eventually, the machine learning algorithms will have the right amount of data needed to act with minimal human interaction and detect threats at internet scale. Human interaction is key to machine learning processes however, especially at earlier stages of development when models find themselves unsure how to categorise a particular instance and it is critical for them to have the ability to ask for help.
Any organisation setting up a machine learning algorithm as part of its security infrastructure needs to engage with it through what is known as active learning. This means that human analysts incorporate the correct thresholds into the model that it can use when making the binary decision of whether to flag instances observed within the security infrastructure. Without this guidance from an experienced human analyst, things become problematic rather quickly, as incorrect assumptions made by those in charge of the model can prompt it to ignore real threats and lead to dangerous false negatives. On the other hand, well-functioning active learning will lead to a feedback mechanism that provides the model with the ability to identify and surface questionable items on its own.
A new kind of machine learning
The dynamic nature of the evolving threat landscape means that collaboration and diversity is essential in order to avoid ideas stagnating. This very much applies to machine learning, as it will not be enough for data scientists to use their previous ‘go-to’ algorithms when building models. Instead, blended models, where the base model marry two or more different perspectives together are needed. It is also possible to enhance the machine learning process by using two or more provenly excellent models together to classify unlabelled examples and escalate any disagreements to the active learning system. Providing ongoing feedback to the model is also key to avoid the otherwise unavoidable degradation of it over time.
Ultimately, cybercriminals and the infosecurity community will continue to fight to stay ahead of each other for the foreseeable future. However, the good news is that even as attackers become increasingly sophisticated, the machine learning capabilities currently being developed by security teams across the world is making it possible for organisations to adapt to automatically identify threats on an internet scale in a way hitherto impossible. If managed correctly, this will leave previously invisible bad guys with nowhere left to hide.
Adam Hunt, data science, data engineering and research, RiskIQ