global politics, relationally

What Can We Know About Cyber Security Data?


Jamie Collier and Brandon Valeriano

(This post was written with Jamie Collier and cross posted on his blog Cyber Security Relations here. Check out his stuff. This post might be bit too mean to Norse for a mass-market website so RI gets the benefit).

With Norse, a cyber threat intelligence firm, imploding due to a lack of confidence in the company’s data and other associate problems, there is clear cause for concern about the nature of cyber security incident data. Although we should not jump to conclusions — Norse’s failings are unlikely to represent problems in the cyber intelligence industry more broadly — it does nonetheless lead to questions. There are two lessons to learn from Norse and the use of cyber security data.  images

First, the quality of data matters. Norse had over eight million sensors, actively collecting from honeypot systems the company has placed around the world. These honeypots are mechanisms setup to detect cyber attacks. Those installed by Norse were supposedly able to adapt in order to mimic different types of systems, thereby picking up a larger range of potential attacks. However, the data these sorts of platforms provide are largely superficial and untested. Real time information is useful, but only provided that it is combined with both other information sources and sophisticated analysis. Although Norse had a high-volume of raw, real time data, the firm was unable to go any further (for example, by providing a context for attacks or a sense of broader trends). Robert M. Lee has elaborated on this point, outlining the crucial differences between threat data and threat intelligence.

Second, the failings of Norse highlight a clear disparity between what might look interesting to media outlets or board members with no background knowledge of cyber security, and what is actually useful to the technically minded information security community. Although infographics such as Norse’s attack map, which displayed ‘live’ attacks, catch attention, they do not necessarily provide any useful information or insight. Of course, data visualisation could provide a range of useful functions. For computer emergency response teams, visualisation tools can provide a clear overview of an attack (what assets are being targeted, the timeline of events, etc.). Likewise, for insider threat detection, visualisation is able to highlighting anomalies or strange employee behaviour that may be worth exploring. Crucially, the failings at Norse have shown that when visualisation tools are concerned, it is the substance of the data that is most important.

All these limitations in cyber security data do not mean there are things we cannot know about the domain. Massive amounts of information can be culled from certain indicators like network traffic data and CERT monitoring, but this needs to be done very carefully, likely in an open and collaborative environment better suited for academia rather than cyber security firms.

As Valeriano and Maness (2015) demonstrate we can know how the major threat actors are, what their goals are, and past methods.  It is possible to understand the scope of the international cyber security landscape but at this point they have only investigated the dynamics of cyber conflict between rivals. There are a few caveats to this, all data is based on publically available information.

It is often said that cyber tactics are covert and this is not entirely true.  Much of what happens in cyberspace is the definition of overt – by interacting with external networks threat actors make their presence known.  They can try to mask their origins but language traits, common techniques and malware, and motive plus historical context give us a great deal of information about who is attacking whom.  The attribution problem is often overstated, what is beyond our ability to real time data that be used to charge culprits in the act on based on domestic legal standards.

What is covert is information on ongoing infiltrations and zero day threats (covert by definition).  Absent of someone like Snowden dumping a massive amount of classified information, actions done to monitor but not manipulate are often hidden and this issue will continue to be unexplored by current detection methods.

In short, there is much we can know about the cyber security domain that can be gleamed from data. Operating in this landscape as if threats cannot be known, monitored, and predicted betrays the great advances we have made in doing exactly this.  What we cannot do is watch ongoing operations as they occur.  This is mainly because organizations might not know they are violated till after this happens, as was the case for the Office of Personal Management.  Companies like Norse might operate at a level where they promise a great deal of information but this is likely to be a promise that they cannot be kept.  There is clearly a great utility in cyber security data, but we must temper expectations and excite with collaborative analysis and sobered expectations of the utility of data based efforts.


Author: Brandon Valeriano

Brandon Valeriano (Ph.D. Vanderbilt University, 2003) is a Senior Lecturer in Global Security at the University of Glasgow in the School of Social and Political Sciences. Dr. Valeriano’s main research interests include investigations of the causes of conflict and peace as well as the study race/ethnicity from the international perspective. His book Cyber War versus Cyber Realities is due to be released soon on Oxford