Manual threat intelligence management: Doing it the hard way - Part 2

This is the second post in a series on manual threat intelligence management.

Once captured, threat intelligence data must be processed. Processing includes several steps:

  • Normalisation
  • Deduplication
  • Storage of indicators
  • update, expiration and removal of old indicators
  • Score/Weight intelligence
  • Enrich indicators for context
  • Associate indicators with Actors, Campaigns, Incidents, TTPs, etc.
  • Track and maintain Actor Alias list

If you have chosen more than a very few feeds, you will likely encounter a variety of formats. If you’re lucky, it will be something structured specifically for intelligence, like STIX, OPENIOC or CYBOX. Others will use XML or JSON, which are also structured, but not specifically created for threat intelligence information. The rest will arrive as unstructured text in a variety of file formats. You could receive intelligence via CSV, .txt, PDF, Word document, or any other text format. You will need the necessary expertise to normalise the disparate feeds by parsing the required data out of the feeds.

This could require a sophisticated understanding of RegEx and/or JSON/XML. Expect to create a different parser for each of your unstructured sources of intelligence. You will also need to store the parsed information in a database for later use. To give you a sense of scale for this initial repository, remember that, today, collecting a large number of feeds could result in as many as 10 million indicators per day or more. That number will only increase with time. 

Plan accordingly. However, before storage, you should deduplicate your collected indicators to reduce the overall processing load. It is important that care is paid in the preceding the normalisation step, as incorrectly normalised events will not be identical, and therefore will not be deduplicated, resulting in unnecessary load and copies in later stages of the process. These duplications could even lead to replica alerts and investigations. 

One way to handle this would be to check the database for the existence of a particular piece of data before adding it as a new entry. If it is already there, adding a tag to the existing entry to note it was also found in another source is useful for context. Once you have normalised, cleansed, and stored your chosen indicators, you must do necessary maintenance on previously collected indicators. The reason for this is that indicators change over time. Sometimes they change types, such as going from a Scanning IP in March 2014 to a Brute Force IP in May of 2015. You need to not only capture and reflect these changes over time, but also “expire” indicators after some period of time. 

This can be an arbitrary time frame that is set globally, say 30, 60 or 90 days, or it can be set individually by indicator type. Be aware though, that failing to expire indicators promptly will result in increased false positives, as can expiring them too quickly. It is a balance that must be struck, monitored and adjusted as needed.  Next, you will want to score and/or weight your intelligence in some fashion. Both give you the ability to prioritise certain indicators or sources, to allow you to focus your attention on those first, among the millions of indicators consumed each day. Do you trust one feed more than another? Give it a higher weight. Use that weight in your evaluation rules to prefer information from this source. 

Do you consider one type of indicator more threatening than another? Most do, but you will need to define them yourself, decide how you will classify them, and then incorporate these values and weights into your evaluation of what to present to your analysts. The scoring and weighting are the first enrichments you will perform on your intelligence data. Since you want to maximise the number of events/incidents, etc. your analysts can triage each day, you may choose to enrich your indicators for context. In addition to scoring and weighting, enrichment can mean many things. For example, information such as GeoIP, WHOIS requests, or reports from sites like VirusTotal or SHODAN. Basically, anything that will help your analysts to make a decision in the shortest amount of time should be considered at this step.  

Enrichment challenges include possible costs for commercial enrichment sources, coding or scripting necessary to integrate with your indicator database and maintenance of those mechanisms that enable the integration. Each new source of context brought in increases the size of an indicator, so planning should include such increased storage requirements. Advanced enrichments might include associations with actors, campaigns or incidents and tracking of actor aliases. These further enable analysts to gather all relevant information on indicators into one place, requiring less manual research and timelier decision-making

Actioning threat intelligence

Although a database of indicators and contextual information is useful, it is not enough. Once a storehouse of normalised, vetted, enriched information has been created, organisations must devise a means to use this information in some way. In order to confer real-time, let alone proactively benefit, the collected intelligence may be provided to some other security technology already in place. Most often, this is the SIEM or log management solution, but can include other technologies as well. 

For example, firewalls that support it could be given a list of IPs, domains or URLs that will be automatically blocked. Similarly, web proxies could be given web or domain information to do the same for user web traffic. IDS/IPS is another possible integration point, and some might opt to deliver MD5 or SHA hashes to endpoint protection solutions to enhance the lists of malware for which they monitor. After identifying the technology or technologies you wish to integrate, the normalised intelligence data must be extracted and forwarded to that destination. In order to do this, you will need to determine which fields are useful to each technology, create queries to retrieve that information, reformat it into something that it will understand, and then create a mechanism to forward that to the device/s involved. For example, if you are using ArcSight, you will need to send it in the CEF format. 

For each additional integration, you will need to repeat this process with another forwarder, each of which need to be maintained over time. Once the information has arrived at its destination, you must create “content” that will take advantage of the information. In the case of firewalls, it could be as simple as creating a block list and writing a rule to reference it. In the case of SIEMs, it might include custom parsers, lists, rules, alerts, dashboards and reports. Each integration will require its own content be created. Just as with every other component in the process, this must also be maintained and updated over time. The final task in this stage is to automate the indicator import and expiration process. Indicator import is obvious, but expiration is equally important to avoid overloading the integrated technologies with lists that grow ever larger over time. Without automation, you will have to establish and manage a manual import and expiration process. 

Up next in the series: Threat Intelligence Analysis and Maintenance 

Chris Black, Sr. Sales Engineer, Anomali
Image source: Shutterstock/BeeBright