Automating cloud security: How to leverage machine learning to beat the bad guys

For cybersecurity pros who fight unseen cloud adversaries every day, the rise of faceless security automation might feel uncomfortably familiar. But is automation yet another shapeless enemy, this time intent on slashing cybersecurity jobs? Or is it something else? 

Fortunately, the demand for cloud security experts far outstrips the supply. The smart players don’t fear automation, they embrace it as a better way to catch the bad guys. For them, creating the ideal analyst/machine partnership is the mission. But how, exactly, can automation and machine learning (ML) best contribute to cloud security workflows? 

To get the answer, take a look at how an analyst typically responds to a breach. It usually goes something like this: 

  • Collect data - gather information on what happened on each entity during the breach  
  • Develop insights - analyze the indicators of compromise to spot other potential compromises and to establish the sequence of events (also known as the “cyber kill chain”)  
  • Apply wisdom - apply deep understanding of the organization’s technology, people, and processes to analyze vulnerabilities (which are seldom exclusively technical)  
  • Take action - create durable and effective solutions by blending technical expertise and interpersonal skills 

The last two steps of the workflow are interesting and demanding work. Vulnerabilities can be found anywhere: from software deep within your data center to the human sitting in the chair of your call center. Sometimes the fix is as simple as a software patch. And sometimes it’s as complex as an organization-wide change or a new staff training initiative. These are often complex questions, and they can’t be answered with machine learning. 

In contrast, tasks associated with data collection and analysis have historically been tedious exercises in brute-force log and event correlation. Sometimes the task is so monumental, and the systems under analysis so ephemeral, that the investigation never goes anywhere. Using machine learning and automation for these steps in the workflow allows security professionals to start their work with a clear understanding of where the attack started, how it made its way through the infrastructure, and what data, servers, and systems it compromised. Machine learning gives cloud security pros a fighting chance at fixing the vulnerability. 

There are a growing number of machine learning options available. To find the one that makes the best contribution to a breach response workflow, consider these three questions: 

What does the model know? 

You don’t need a doctorate in data science to know that an ML model is only as good as its inputs (in the ML world, inputs are called “features,” but we’ll stick with inputs here to avoid confusion with product features). If you’re predicting real estate prices, for example, your model’s not going to work very well without the property’s square footage. And if you want to catch East-West threats, your solution will disappoint if it only monitors traffic at the network boundary. 

Insight quality depends on input quality. Inputs must be comprehensive and detailed enough to help the solution understand connections (between apps, processes, and data, both internally and externally), relationships (high-level entity groups and functions), and behaviors (cause and effect patterns and sequences). 

Today’s cloud security solutions gather inputs in one of 3 ways: 

Scrape log files - cloud entities generate millions of log events that can be used as ML inputs. Unfortunately, logs are imperfect: they vary wildly between vendors, log data is often incomplete or ill-suited for security analysis, and the sheer volume of log entries increases complexity. Although machine learning can make log analysis far simpler, log data is not optimal for security applications.  

Monitor the network - placing listeners in a cloud network (e.g. on a firewall) can give the ML model better data, but network monitoring has limitations. East-west, intra-VM, and geographically fluid entities are often out of reach, and activities like process or application launches and privilege changes are not visible at the network perimeter.  

Instrument workloads - extracting ML inputs directly from cloud workloads maximizes reach and optimizes inputs for the model. The associated administrative overhead depends on the vendor’s implementation tools. 

How well does the model work in your environment? 

Rather than debating the merits of Bayes Classification vs. SVM algorithms, practitioners need a more pragmatic way to think about how an ML security solution will work in their cloud environment: 

  • Efficacy: does the solution spot attacks or is it easy for attackers to bypass? Will the model catch all attacks or just known attacks?  
  • Efficiency: will the security team spend time improving protection or will they waste their days on wild goose chases and policy development?  
  • Scalability: cloud offerings can scale very quickly - can the security solution keep up? 

The following chart, which shows how an ML-based security solution might process and respond to events, is one way to think about these critical questions: 

From the practitioner's perspective, this is a simple way to think about how an ML-based solution delivers on its promises. Efficient? Effective? Scalable? This might remind you of the response the grizzled Engineer gave his project manager about tradeoffs: “Fast, cheap, or good? Pick two.” But with machine learning, that bit of wisdom no longer need apply. 

What tools are available to assess an attack? 

Even “perfect” breach detection - no false positives and every attack detected - won’t fix underlying vulnerabilities. To take action and remediate problems, analysts need help investigating and understanding incidents. How did it start? What systems did it affect? Where else did it go? 

These are make or break questions. Too much information, at the wrong level of detail, can make it impossible to develop insights. But not enough information can lead to oversimplified or inaccurate conclusions. Here’s what to look for: 

  • Data aggregation: Reducing cloud entities into a manageably small number of groups reveals the true operational behavior of the system  
  • Multidimensional visualizations: showing an attack from multiple perspectives (e.g., a user view showing which accounts were involved combined with a connectivity view showing which databases were involved) clarifies cause-and-effect relationships  
  • Correlation of events: timelines are an essential tool to understand breach mechanics and reveal lateral attack movement  
  • Easy navigation: solving the puzzle of a breach is an interactive process that works best when investigators can easily explore all aspects of the system  
  • Compare/contrast tools: seeing changes to the system at two points in time allows investigators to quickly zero in on suspicious events 

Will automation and machine learning change how cloud security professionals do their jobs?   

Of course they will.   

The right tools will take the drudgery out of data collection and analysis, and they’ll provide better insights, faster. It’s a brave new world - and the analyst/machine partnership is about to get much more efficient and effective.

Sanjay Kalra, Co-Founder and CPO at Lacework 

Image Credit: Sergey Nivens / Shutterstock