Threat Intelligence Definitions
Cyber Squared defines threat intelligence as “An emerging information security discipline that seeks to recognize and understand sophisticated cyber adversaries, specifically why and how they threaten data, networks, and business processes.”
And Gartner takes a stab at defining it: “Threat intelligence is evidence-based knowledge including context, mechanisms, indicators, implications and actionable advice...that can be used to inform decisions.”
Threat intelligence is the theme of the year in information security, but as you can see, the term’s definition is as broad as last year’s “Big Data” trend and not much more helpful. The need for a new term comes from our industry’s realization that Big Data isn’t much use if it is not actionable. How do we sift through the newly available data? And what can we learn from it? Well, by using threat intelligence.
What we learn from, as well as how we make those insights actionable depends on who you ask. The goal of all this data collection is to be able to make better decisions about how to protect our systems. To mitigate threats we can implement new controls, remediate vulnerabilities, or accept risks. Which of these actions we take, at what rate and in what order are the questions that threat intelligence should enable any well-informed security program to answer.
Threat intelligence can describe various piecewise parts of how a breach happens - from threat actor to threat action, to the malicious code (or lack thereof) itself or to the breach itself.
Steve King’s definition is the best I’ve found so far. He segments threat intelligence into five data types:
- Internal Intelligence. This is the intelligence about your organization’s own assets and behavior, based on analysis of your organization’s activities.
- Network Intelligence. This is intelligence gleaned from analyzing network traffic at your organization’s network boundary and on networks that connect you to the outside world. FireEye is a good example.
- Edge Intelligence. This understands what various hosts on the Internet are doing at the edge of the network. This information is available from governments, ISPs, and telecoms. For example, Akamai has a lot of intel on the edge of the Internet.
- Open-Source Intelligence. This comes from the plethora of information available on websites, blogs, Twitter feeds, chat channels, and news feeds. It’s available to whoever wants to collect and mine it for useful intel.
- Closed-Source Intelligence. This is the most difficult to acquire — closed user group sharing (for example, FS-ISAC) collects authenticated underground websites and chat channels, information gleaned by intelligence and law enforcement operations, and human intelligence. FS-ISAC, the Financial Services Information Sharing and Analysis Center is an industry forum for collaboration on critical security threats facing the global financial services sector.
Threat Intelligence Dynamic Model
I suggest that a different model for categorizing and thinking about threat intelligence and security data would be more useful. The model is similar to the one used in tradecraft—or sports—in collecting intelligence about an evolving adversary. In this model, every stage in the update chain informs the next, and the cycle is dynamic and constantly ongoing. New Threat Actors generate new successful attacks, and an evolution of possible strategies (hopefully) changes our system topology to be ready for the new threats.
King’s breakdown is the most useful in terms of categorizing where the data comes from, but I suggest that a better frame would be to think about what the data describes. That is, what does the data describe in our model?
Answer: not everything. This happens in two ways. First, threat intelligence is evidence-based, and at best, we collect some ground-truth metrics about each of the above. For example, Possible Strategies (that is, vulnerability definitions) are at best quantified by the National Vulnerability Database and any zero-day feeds one might be able to correlate to their environment, but this data is only descriptive of what we’ve collected and enumerated, not of the entire landscape of possible strategies. This is a case of not enough breadth or manpower in collection efforts. Second, there is another bit that we miss when we talk about security data or threat intelligence - data we haven’t thought to collect yet, or evidence that’s out there that we’re either not collecting or not integrating into our analysis.
First, lets see what we’re collecting:
Information about threat actors themselves is usually hard to operationalize, but knowing the locations of malicious IPs or C2C servers allows for implementing a mitigation control around that known information. This type of data attempts to capture the location or past actions of threat actors, and is necessary, but hardly enough to stop the threat. There’s a marginal return to how useful buying more blacklist data is, since attackers shift their locations and learn how to obfuscate detection methods. A better way to think about this type of data is provided by Alex Pinto, Kyle Maxwell and their BlackHat talk on measuring the quality of threat intelligence as a way to determine the quality of your data source.
MITRE and NVD provide the common vulnerability enumeration (CVE), which is a vast but incomplete picture of the possible strategies. However, this is not a dictionary. This is a company with limited resources attempting to streamline a process. Some types of vulnerabilities are prioritized over others, some never make it in because of resource limitations. Some are never submitted - so we look to providers of zero-day vulnerabilities for the undisclosed or unpatchable vulnerabilities. Learning what’s possible for the attacker informs us about what we must defend against.
However, it’s impossible to be secure against _every_ vulnerability, so let’s filter down further:
There are a few ways that we can know about actual ongoing attacks. The most simple is rule based - knowing that an IDS is picking up a known-exploit signature informs you of an attack in progress. More complex systems will do anomaly detection, and SIEM systems will record all the actions undertaken to implement early warnings or quick detection rates (i.e. mitigate damage). In the aggregate and retrospectively, this data informs us about the worst of vulnerabilities - those which attackers are actually attempting to exploit (that is, those that put us most at risk). AlienVault's threat detection capabilities are based off of signatures and data that—in addition to the result of novel research from AV Labs—is derived from Open Threat Exchange. Customers of USM have access to AlienVault Labs' Threat Intelligence Update, which synthesizes data from OTX into automated defenses for their deployment. It's a classic example of how you can refine the raw threat data in OTX into powerful, pertinent defenses against new and dangerous threats.
Information about your business, accurate asset inventory and grouping, accurate assessments of which mitigating controls are in place - all of these slice out a significant amount of risk from your system. Some vulnerabilities simply don’t apply to your business because of your topology, or the way in which some attacks must be implemented. We can slice those out by taking a look at the difference between what is possible in the abstract for the attacker, and what is actually feasible on your network.
However, all of the previous types of data only attempt to capture bits about the ways the attackers behave - they’re not a complete picture, nor is my model necessarily the best one at describing everything about an attacker’s methodology. Conversely, successful attacks, or our past mistakes allow us to get to the heart of the issue. We’re trying to prevent breaches, and historical data about what’s been successful (we’re currently tracking around 500 CVEs which have successful attacks on them in the past six months) allows us to be certain that our remediation efforts don’t go to waste. Moreover, this is endgame data - that is, the outcomes we seek to predict with the other data sources are captured by the successful attack dataset. Next, we can determine what about the environment caused the attack to be successful, where successful attacks come from, which potential attacks they are generated by, etc. In other words, we can reverse engineer the other three data sources from this one, or use the other ones to augment our understanding of what types of attacks lead to attacker success.
The prior approach to assessing threat intelligence—grouping by data origination—measures operational efficiency. Namely, it measure how good your data sources are, and how good your security team is at using or operationalizing this data. This is useful, and should be a part of any discussion on threat intelligence or a mature security practice. Our approach instead attempts to describe the effectiveness of threat itself, and hence the amount of risk that a remediation mitigates. There’s value in all of the aforementioned types of threat intelligence. Value can be derived in clever ways, as in this example
https://securityblog.verizonenterprise.com/?p=6522 [no longer available] of contextual graph analysis on simple domain data. Sometimes, it’s just smarter to use the simple insights, and look at what’s worked in the past.
Michael Roytman on Twitter (he tweets mainly all-things related to data science): https://twitter.com/mroytman