Skip to main content

Security in the cloud - what should you be tracking?

(Image credit: Image Credit: Melpomene / Shutterstock)

As you move your applications into the cloud, your approach to security will have to follow. You will be in good company - Gartner has forecast that worldwide end-user spending on public cloud services will grow 23.1 percent in 2021 to $332.3 billion, up from $270 billion in 2020. Cloud security spending will increase alongside this, with just over $16 billion to be spent as part of this shift.

For security teams, keeping track of all these cloud services, applications and data will require some changes. While the traditional security model based on perimeter security still has some validity, the tools and techniques that worked for on-premises IT are no longer fit for purpose. There are a few reasons for this.

The first is that security is no longer a “human scale” problem. While the security operations center model is necessary, the days of adding more people alone to keep up with potential threats are gone. In the past, analysts would have tens or hundreds of threats to investigate between them; today, that number has gone up to tens of thousands of potential issues that might need investigation every day. 

Simply adding more people is not a scalable model, as the number of people with the right skills is limited and they are heavily in demand, and the volume of issues is beyond human control alone. The deluge of data taking place and the scale of potential threats means that security teams have to modernize their security operations and look at cloud-based security information and event management (SIEM) and security orchestration, automation and response (SOAR) approaches.

Automating security workflows is the obvious next step, but this relies on getting the right process and the right data in place. What you currently look at will be useful, but what should your priorities be in the cloud?

Understanding your cloud security responsibilities 

The starting point for this approach is to understand your attack surface. This covers all the assets, technology, applications and data that make up your IT, and will include an inventory of all the different points of entry that could be used to access your information. It’s important to think about information security in general too - traditionally, this would have covered physical access to your building, but today this will include remote working and all the new patterns that have been adopted following the pandemic.

For cloud-based applications, the attack surface will include a range of new points to consider alongside your existing inventory. To evaluate this, there are a range of questions to ask:

  • Physical access to the infrastructure - who can touch those servers, and what can they do to them? 
  • Network ingress from the internet and intranet - how does data get to and from the cloud, and is this access on solely private networks or publicly available?
  • APIs with public and private access - who can access your APIs and where from? 
  • Issues within your application - are there any faults in the code your software developers have put together that can be exploited? 
  • Bugs and vulnerabilities in software libraries that are used - are those libraries checked for issues over time, and updated regularly? 
  • Traffic that isn’t appropriately validated and sanitized - can unauthorized individuals send packets to your cloud-based infrastructure, and what happens if they do? 

Once you have completed this inventory, you should then have a complete picture of your infrastructure and how data gets both in and out of it, as well as your responsibility levels for each of these areas. For physical access to cloud infrastructure, this would normally be the responsibility of the provider that you use, so examine their policy and check it is enough. Similarly, for third-party software or application libraries, you can track their updates and potential issues. 

Using this picture as a guide, you can then set up monitoring to track activity over time. This should give you the data needed to follow how secure these individual elements are. 

Metrics to watch for 

Monitoring this set of data supports a more proactive approach to security. By correlating behavior and analyzing a baseline of activity, you can then watch for rogue data that does not conform to what you expect.

The first essential metric to track is failed access attempts, especially those that occur rapidly or methodically over time. This should be wider than simply looking at user accounts and failed password entries, but cover any entry point that has an authentication requirement. By looking at normal access patterns for your APIs or other resources such as file stores and databases, you can see where there are deviations and investigate. Similarly, you can rate-limit access requests and apply multi-factor authentication for sensitive assets, so that attackers can’t automate their attacks too.

Another metric to track over time is the amount of traffic moving around your systems.  This should follow a reasonably predictable pattern for most of your internal applications. If you do experience a sudden spike in traffic volume, or there is another anomaly in the pattern, that can alert you to potentially malicious activity. When attackers do get access, they may wait to begin exfiltrating data, so this pattern-based approach can show up other issues that will have to be investigated. 

For some companies, applications may vary in their usage depending on customer behavior - a good example would be media or online gaming companies, where demand for services will be linked to programs or sports events that will trigger customer activity. Understanding these spikes and bursts in demand can help as well - if you suddenly see a burst starting where there is no event expected to trigger interest, then it is time to investigate further. It may be an unexpected wave of demand, or it may be something more sinister, but the data should help to point out where to begin.

Monitoring the source and destination of traffic can also indicate problems, especially when compared against lists of networks known to support malicious content. Users may have to download files or media for their jobs, but it is also important to stop any unauthorized activity that may open the door to attackers.

Another important metric to track is session length. With everyone working remotely, VPN access will be more frequent than before, but it is still important to track session lengths as this could be an indicator of compromise. For example, a network session that remains open for an extended period outside working hours could indicate an unauthorized VPN tunnel set up to transfer data. Spotting these sessions could show up either a malicious attacker or someone who should be educated on best practice on access. 

In addition to tracking user session lengths, it is also worth paying special attention to connections established with ports used for remote access, like port 22 (SSH), port 23 (Telnet), and port 3389 (RDP). These are commonly used for legitimate traffic, but also to exfiltrate data, so knowing the data on normal, expected behavior can show up other activity.

Set up to succeed 

In addition to looking at data from your applications and cloud infrastructure over time, your approach should also extend to how your systems are configured and access controlled. Looking out for policy violations can show where someone is not following the accepted rules, and this could be the sign of a disgruntled employee or an outside attacker. For example, you should have a policy on access to data stores covering encryption and access control. Auditing this policy on a regular basis will help, but automating the process so it can carry on continuously identifies any violations immediately. Whether it is deliberate or not, this makes compliance easier.

Another systems configuration area to track over time is the use of security certificates. Certificates should be used to validate a range of infrastructure components and connections, from user devices and connections through to transfer of data between devices and data storage. These certificates have to be correctly implemented themselves and they all have times when they become invalid - an invalid certificate can break websites and applications, as well as leading to potential security failures, so you should regularly track and audit all your certificates for compliance.

Alongside tracking certificates, you should also audit user policies and access levels. Some of your users may have root or superuser access to IT systems; however, this level of access should be tightly controlled and only used when necessary. Auditing this access should show where it is needed, and any accounts downgraded if they no longer require full control. Any accounts with cloud providers should have multi-factor authentication applied to them by default as well.

It is also worth monitoring access by any third parties or consultants that you have. These accounts will often need as much access to your systems as your employees, so ensure that they have the appropriate access levels and that they aren't able to access systems beyond the scope of your agreement. Finally, implement policies that regulate how the system manages changes to an employee or partner's status. Each change in a user's status should trigger an audit of their access rights and remove any access that is no longer required. When you terminate an employee, policies and system processes should quickly remove all access from the system.

If the worst takes place 

Inevitably, there will come a time when a threat comes in that needs to be addressed. It’s now accepted that every company will be at risk of a breach, and that these breaches become a question not of if but when. In order to respond quickly to these risks and attempted breaches, you should look at your response metrics too.

The two things to track in your security team are Mean Time To Detect (MTTD) and Mean Time To Respond (MTTR). MTTD covers how quickly you detect the signs of a potential breach into your systems and MTTR measures the time it takes to investigate and respond to that risk event. Ideally, both of these metrics should be as low as possible. 

Automating steps in your detection and analysis process can help your team to cut their MTTD figure by flagging incidents that should be investigated and followed up. Similarly, your security analytics and observability data should help your team to look into potential risks or outliers in the data faster. By reducing the manual process as much as possible, you can help your team to be more efficient and effective.

For companies moving into the cloud, dealing with the sheer volume of data coming in can be the biggest initial hurdle to implementing successful cloud security processes. Once you have the right processes in place to deal with data, you can make the most of the opportunities to automate processes and take out manual work. Setting the right metrics and understanding where that data comes from can make that process itself more successful.

Iain Chidgey, Vice President EMEA, Sumo Logic (opens in new tab)