Skip to main content

Five things to look for in an anomaly detection system

(Image credit: Image Credit: Pitney Bowes Software)

If there were any doubt as to the importance of data in our daily lives, consider these simple statistics: 90 per cent of data in use was created within the last two years, and that data is expected to grow from 33 zettabytes in 2018 to 175 ZB by 2025. All businesses depend on data for decision-making, but the sheer volume of data collected can often cause valuable information to be missed, lost, or misunderstood.

Anomaly detection solutions plug this gap by using techniques that spot deviations from normal behaviour. These anomalies can be positive (such as an increase in sales from a new product launch) or negative (for example, a problem that needs to be fixed quickly). Either way, anomaly detection provides a sharp focus on what’s important in very large datasets.

Anomaly detection is valuable for any business but particularly so for high-velocity businesses where large volumes of data are processed and actioned in real-time. Whether in the fields of finance, eCommerce, gaming, or telecom, detecting anomalies in real time can help businesses avert disaster, seize opportunities, and be more efficient. Here are a few examples:

Finance:  Anomalies could indicate illegal activities such as transactional fraud, identity theft, network intrusion, account takeover, or money laundering. By immediately detecting anomalies in your data--whether failed and declined transaction rates, multiple login attempts, device usage and the transaction amount per product--performance issues and security threats can be avoided.

eCommerce: Spotting changes in behaviour can help improve product placement or inform the development of personalised product offers. And being alerted in real time of unexpected behaviour that poses a security threat (such as DDOS attacks (distributed denial-of-service)) means you can respond immediately in order to prevent fraud and revenue loss.

Gaming: System glitches or performance issues can be spotted and fixed--whether in operating systems, levels, user segments, or different devices--before they interrupt game-play, degrade player engagement, and damage hard-won brand equity.

Telecom: New, complex, IP-based services and their convergence with traditional voice services makes for complex network management challenges where a service loss, even in a small node, could affect thousands of customers.

But identifying anomalies isn’t straightforward. With thousands, or even millions, of metrics to consider, manual detection is impractical and would require a campus full of data analysts.

Fortunately, automated anomaly detection solutions have filled the gap by using modern data analysis techniques combined with machine learning tools to find even the subtlest anomalies.  However, product features vary, and it pays to ask some simple questions to help find the best fit.

Does the solution work in real time?

It might be okay for a marketing team to get the results from a lead generation activity a few months after it's been launched, since it takes that length of time for results to show in the sales pipeline, but a corporate banking team, and their regulator, will want to know straight away if unusually large payments are being made to a high-risk individual.

Although not every anomaly needs to be found in real time, it’s best to do so since one can always postpone action on an instant alert, but it’s impossible to react in real time to an alert that’s been delayed. 

Can it scale with your organisation?

Datasets can be relatively small — hundreds or thousands of metrics — in which case dashboards or other visualisation techniques might be fine, but for very large datasets that contain millions of metrics, businesses should use a system capable of analysis on a massive scale. Even if data volumes are currently small, they won’t stay that way, so volume calculations should include headroom for future growth.

Can the system adapt with your business?

The rate of change helps decide what the most appropriate detection methods are. Data that’s continually in flux — new variables and changing patterns of behaviour — as is the case with most high-velocity businesses, need to be evaluated using adaptive algorithms that change their behaviour when they're executed, based on the new information.

Consider a ride-sharing company that’s quickly expanding their operations into new locations. More passengers, more cabs, different routes, different peaks and troughs throughout the day – a complicated mix of variables that must be collected, interpreted and acted on. Manually monitoring the thousands, if not millions of metrics, involved would hardly be an effective way to scale.

Does the system correlate anomalies to assist in root cause investigation?

Just detecting anomalies is not enough. Quickly understanding the cause is key to remediating a potential issue or taking advantage of an opportunity.

For a small amount of metrics, you can get by with reporting anomalies for each metric separately and manually figuring out which are related.

But taking the same approach for millions of metrics will result in a huge number of reported anomalies, making it impossible to work out cause and effect.

Machine learning algorithms that perform clustering and similarity (e.g., LDA, SOM, Stacked auto-encoders) will find relationships between metrics and distill the flood of anomalies into a more manageable set of correlated incidents for investigation by human experts. It is important to evaluate that these algorithms are implemented in the system to work at the high scale required (millions of metrics), while producing accurate correlations.

Can the system correlate anomalies with known events?

Events often explain why an anomaly is happening. A release of a new version could cause an app to crash more frequently. There’s a glitch in Facebook’s ads and then you see a drop in your site’s daily visitors. A closure in a subway line could cause more residents to use ride-sharing platforms.

The system has to learn the relationship between events and anomalies. And that again takes an element of machine learning. Manual correlation is hardly scalable.

Conclusion

Companies can lose a lot if they don’t find and act quickly on business problems or opportunities. Customers can be lost, new sources of revenue missed, or profit margins dented. Anomaly detection solutions help by:

  • Finding anomalies in real-time,
  • Being able to scale as the business grows,
  • Adapting to changing patterns of behaviour,
  • Correlating anomalies to understand the underlying cause, and
  • Matching anomalies to business events.

 Amit Levi, VP of product and marketing, Anodot
Image Credit: Pitney Bowes Software