Machine learning is critical to the future of cybersecurity and helping security teams overcome the challenges of modern cybersecurity attacks. Indeed, its ability to ‘outthink’ humans can boost return on investment (ROI), drastically improve productivity and minimise resource expenditure. However, machine learning is also not just a ‘set and forget’ solution. In fact, companies need to treat machine like an intern on their first day. Security teams should not assume a machine learning programme can hit the ground running – there needs to be an onboarding process where you check in on the models frequently and spend time getting them started in the right direction.
Machine learning – the onboarding processes
Machine learning models are fast, tireless and retentive, but they often lack common sense. Just like an intern on their first day, machine learning is not going to understand how the organisation works, nor the concepts it will eventually master. Therefore, with any machine learning project, there must be an onboarding process.
To start with, machine learning models need to be checked frequently and a lot of time must be spent on getting them started in the right direction. Indeed, machine learning is unable to think critically, which is why humans need to be closely involved when it comes to cyber security. Models are low-level taskmasters that cannot see the bigger picture and, as such, need to be continually spoon-fed instructions.
Over time, machine learning models will see patterns based on feedback and will learn to see what security teams want them to see. The more the models learn the less human monitoring they will require, but they should never be completely autonomous in cyber security. They do not see things or follow a thought process the way a human brain would. They can quickly stray away from the task at hand, sending the entire programme into disarray.
There are four ways security teams can make the most of a machine learning programme:
1. Implement safety nets and monitoring
Before building a pipeline, it is critical that security teams make sure the proper safety nets are in place – the first of which is called a ‘tripwire.’ If the model exceeds expectation of the number of instances it will classify within a certain period, the tripwire will automatically disable it. This measure is critical to prevent it from running out of control.
Going rogue is extremely common for machine learning models when they are first released. While security teams might have provided their initial model with a pristine data set from which to learn, the real world is somewhat different. The model will encounter things that did not appear in its textbook, causing it to default to biases formed through its training data. For example, if training data only contains cats and dogs, when presented with a fish, it will try to classify it as a cat or dog. The model will need to be corrected, learn from its mistakes, and try again.
The next safety net is a ‘whitelist’. These are lists of items models should ignore. In a perfect world, you would not need whitelists, because you would invest the time engineering better features and retaining your model until it gets a specific example right. However, when security teams need to act immediately, they will be thankful for having them. While not ideal, whitelists not only prevent the current model from classifying an instance incorrectly, but it also helps future models.
2. Prevent degradation of machine learning models
The machine learning model may work at first, but without proper feedback, its performance will degrade over time. The precision during the first week will be better than on the tenth week. How long it takes the model to degrade to an unacceptable level depends on tolerance and its ability to generalise to the problem. To enable the model to keep up with current trends, selecting an instance-based model, or one that can learn incrementally, is critical. Just as providing frequent feedback helps an employee learn and grow, the model needs the same kind of assessment.
3. Enable the ability to actively learn
This point places an expert in the loop. When the model is unsure how to categorise a certain instance, having the ability to ask for help is critical. Models typically provide a probability or score with its prediction, which gets turned into a binary decision based on a threshold provided – for example a threat or not a threat. With no guidance, things get problematic quickly.
Left to its own devices, an employee may make an incorrect assumption. In a case where the instance was just below the cut off, but the threat was real, the model will continue to ignore it, resulting in a potentially serious false negative. However, if they chose to act, the model will continue to flag benign, instances generating a flood of false positives. Developing a feedback mechanism that provides the model with the ability to identify and surface questionable items is critical to its success. Put simply, it is critical that models are kept up to date.
4. Blending and co-training
Everyone knows collaboration and diversity help organisations grow. When the CEO surrounds themselves with ‘yes-men,’ or a lone wolf decides they can do better by themselves, ideas can often stagnate. Machine learning models are no different. Every data scientist has a ‘go-to’ algorithm to train their models. As such, it is important to not only try other algorithms, but to do so as a team.
A future with machine learning
Ultimately, companies are operating in a data-driven society where humans cannot go it alone. With some work, machine learning can be used to leverage employees’ knowledge and abilities to fill a necessary gap in the talent pool. However, machine learning models are not something that can be set and forgotten. They need frequent feedback and monitoring to provide the best performance. It is crucial to make providing that feedback easy – investing time will pay dividends.
Fabian Libeau, EMEA VP, RiskIQ
Image Credit: Methodshop / Pixabay