Betting on the right horse: Why cybersecurity is like the Kentucky Derby

In security, experts fared no better than horseracing odds makers: the “favourite” (i.e. pre-guessed attack) didn’t win very often.

The Kentucky Derby takes place in a few days, on Saturday, May 6. It’s one of the biggest sporting events in the United States with hundreds of millions of dollars bet both legally and off the books. Hundreds of firms employ many people whose job is to look at the data and estimate the probabilities of winning by each entrant.  Most sports tracking sites rate it as a top 10 U.S. sports event, which is significant since, unlike basketball or football, few people are going to go into the backyard or driveway for some amateur horseracing after watching it on TV.  My kids love to watch this race, and while we were reading about the horses and jockeys this week, it struck me that there are some interesting similarities between the Derby and cyber security. 

First off, as mentioned above, there are hundreds of professionals looking at past data, recent performances, pictures, practices, etc. to assign odds on each horse. With so much effort and brainpower focused on assessing the odds, you’d think that betting with the probabilities - i.e. putting your money on the favourite - would be a smart play. In fact, betting on the pros’ predictions has been pretty awful. Historically, over the past 150 years, the odds-on favourite has won approximately 33 per cent of the time. That is, betting with the experts means you are losing your money two-thirds of the time.

Unlikely winners

Next, not only have the favourites been bad bets, but occasionally in these past 150 years there have been winners so unlikely that their odds were greater than 50-1. Despite the analysis and research, the actual winner never appeared in anyone’s screen beforehand. Finally, in recent years these trends have reversed. In the last three Derbies, the favourite horse actually won. Chalk it up to big data, better analytics, or improved processing power, but the odds makers in recent years have gotten much better.

Okay, so what does this have to do with cybersecurity? The parallels are interesting. Like the Derby, cybersecurity is a big deal. It draws in billions of dollars, thousands of professionals, and puts hundreds of billions of dollars at risk. The experts aren’t just pundits or consultants; the security firms employ thousands of engineers, researchers, and other experts with the job of guessing what an attack might look like. 

Many years ago, that expertise (i.e. that guessing) was embodied in signatures. Security products shipped full of signatures designed to detect specific types of attacks. In time, the experts couldn’t keep up, signatures were out of date or not installed quickly enough, and the experts lost. That is, the attackers won. The experts in industry realised that pattern patching alone wouldn’t work, and added rules, such as the correlation rules found in a SIEM. An example might be: “if a user is logged in over the VPN from home and is also badged into HQ, something is wrong - fire an alert.” Rules captured more of the experts’ judgement and made guessing more effective. 

However, with the fast changing nature of attacks and the growing volume of data to be analysed, even the experts’ rules began failing most of the time. In security, experts fared no better than horseracing odds makers: the “favourite” (i.e. pre-guessed attack) didn’t win very often. Betting on the experts’ opinions was a losing play in cybersecurity for much of this decade. Take any of the well-known breaches in recent years. All of these companies had invested serious time, money, and effort in the expert intelligence tools, typically some form of SIEM, that were supposed to predict and detect these types of threats.

Instead, the opposite often happened. Take the Target breach. The culprit wasn’t an obvious target. It wasn’t an employee logging in on the VPN and also badging into the office. It wasn’t a Target employee who had suddenly started coming in early, staying late, or receiving formal warnings. Instead, an account from a partner company, specifically an HVAC contractor, was taken over by hackers and used to penetrate Target’s network. Would anyone calculate this type of attack at less than 50-1 odds? Perhaps with the benefit of hindsight, but almost certainly not at the time of the attack. So just like in the occasional Derby, the “winner” wasn’t predicted by any experts, and the expert rules embodied in the company’s security products didn’t help.

Something's changed

So if someone looked at the previous 150 Kentucky Derbies and excluded the most recent three, it would look pretty grim. Experts demonstrated no ability to all to properly predict the winner, and in fact, their odds often did the opposite, tamping down the chance of a correct prediction. In the same way, a CISO surveying the cybersecurity landscape in 2013 might feel pretty depressed. Attacks were happening regularly, were increasing in frequency and impact, and the existing products demonstrated no effective ability to predict (i.e. detect) what would happen. CISOs looked more like incompetent bookies who had to pay out on long shots, rather than effective experts who could manage the odds.

But in recent Kentucky Derbies, something changed. As noted above, the favourite has won the last three Derbies. The experts have gotten better at predicting winners and managing the odds (and associated costs). And in the same way, the tide has begun to turn in cybersecurity, driven by advances in three areas: open source big data management, machine learning, and computing price/performance. Unpacking the buzzwords, there has been real value where these three intersect.

As data volumes (log records, netflow, etc.) have skyrocketed, open source big data systems have provided a way to collect, process, and manage it all. In 2010, a market-leading log management appliance might contain 40 effective TB of storage, considered adequate for the compliance reporting needs of a company in the low end of the F1000. Today that same company generates 400 TB in a week. It’s larger competitors might generate that much data in a day. Open source big data technologies such as HDFS and Elasticsearch enable solutions that handle petabytes of security data with ease.

Machine learning flips the expert approach on its head. Instead of requiring expert rule-writers to guess at attacks that might come, machine learning algorithms analyse trends, create behaviour baselines - on a per user basis - and can detect new types of attacks very quickly using baselines and statistical models. These systems are more flexible and effective than any pure expert-driven predecessors. Finally, the amount of computing horsepower necessary to make all this work is a tenth of the cost of earlier platforms. It’s simply much cheaper to “throw iron” at the problem.  The confluence of these three trends has made new security solutions much better at detecting threats. 

At the time of this writing, Classic Empire is the favourite to win this year’s Derby. Will he win? We won’t know until the race, but if recent trends continue, he’s a better bet than his predecessor five or ten years ago. In the same way, today’s cybersecurity solutions aren’t perfect, but they are better - and continue to get better - than their predecessors. Better data, better analytics, and access to more “horsepower” continue to shift the odds in favour of CISOs.

Rick Caccia, Executive, Exabeam
Image Credit: Den Rise / Shutterstock