What is big data, anyway?
If you haven’t been living in a cave the last five years, you have no doubt run across the phrase “big data” as an IT hot topic. But like so many other terms — “cloud” comes to mind — basic definitions, much less useful discussions of big data security issues, are often missing from the media accounts. So let’s begin with some context.
What makes data big, fundamentally, is that we have far more opportunities to collect it, from far more sources, than ever before. Think of all the billions of devices that are now Internet-capable — smartphones and Internet of Things sensors being only two instances. Now think of all the big data security issues that could generate!
“Big data” emerges from this incredible escalation in the number of IP-equipped endpoints. It is really just the term for all the available data in a given area that a business collects with the goal of finding hidden patterns or trends within it. These, once revealed by analytics tools, can be leveraged to yield an improved outcome down the road (higher customer satisfaction, faster service delivery, more revenue, and so forth).
The flip side of that coin is that the architecture used to store big data also represents a shiny new target of big data security issues for criminal activity and malware. Should something happen to such a key business resource, the consequences could be devastating for the organization that gathered it.
Unfortunately, many of the tools associated with big data and smart analytics are open source. Often times they are not designed with security in mind as a primary function, leading to yet more big data security issues.
The nine key big data security issues
So, with that in mind, here’s a shortlist of some of the obvious big data security issues (or available tech) that should be considered.
- Distributed frameworks. Most big data implementations actually distribute huge processing jobs across many systems for faster analysis. Hadoop is a well-known instance of open source tech involved in this, and originally had no security of any sort. Distributed processing may mean less data processed by any one system, but it means a lot more systems where security issues can crop up.
- Non-relational data stores. Think NoSQL databases, which by themselves usually lack security (which is instead provided, sort of, via middleware).
- Storage. In big data architecture, the data is usually stored on multiple tiers, depending on business needs for performance vs. cost. For instance, high-priority “hot” data will usually be stored on flash media. So locking down storage will mean creating a tier-conscious strategy.
- Endpoints. Security solutions that draw logs from endpoints will need to validate the authenticity of those endpoints, or the analysis isn’t going to do much good.
- Real-time security/compliance tools. These generate a tremendous amount of information; the key is finding a way to ignore the false positives, so human talent can be focused on the true breaches.
- Data mining solutions. These are the heart of many big data environments; they find the patterns that suggest business strategies. For that very reason, it’s particularly important to ensure they’re secured against not just external threats, but insiders who abuse network privileges to obtain sensitive information – adding yet another layer of big data security issues.
- Access controls. Just as with enterprise IT as a whole, it’s critically important to provide a system in which encrypted authentication/validation verifies that users are who they say they are, and determine who can see what.
Finally, some specific thoughts on the data itself:
- Granular auditing can help determine when missed attacks have occurred, what the consequences were, and what should be done to improve matters in the future. This in itself is a lot of data, and must be enabled and protected to be useful in addressing big data security issues.
- Data provenance primarily concerns metadata (data about data), which can be extremely helpful in determining where data came from, who accessed it, or what was done with it. Usually, this kind of data should be analyzed with exceptional speed to minimize the time in which a breach is active. Privileged users engaged in this type of activity must be thoroughly vetted and closely monitored to ensure they don’t become their own big data security issues.
Addressing big data security issues
In a perfect world, all nine areas of big data security issues would be comprehensively secured. In the real world, approximations may be required because the data collection and analysis tools have security that was “bolted onto” the core functionality instead of being “baked in.” Security and information event management (SIEM) solutions should always be deployed to aggregate security logs and automatically identify potential breaches (which also means, of course, that logging should be as comprehensive as possible).
One particular point of concern, which is why I listed it first above, is Hadoop, which was simply not originally designed to address big data security issues in anyway at all. Fortunately, as Hadoop has become more popular, a variety of leading security solution providers have developed commercial-grade technology to help lock it down — and these are joined by contributions from the open source world, like Apache Accumulo.
So these days it is at least possible to shore up some of the more egregious shortfalls of big data security issues introduced by Hadoop (and similar products) security that remain in areas like encryption and authentication.
So what lies ahead for big data security issues?
The future of big data itself is all but guaranteed to be a bright one — it’s universally recognized these days that smart analytics can be a royal road to business success. So this implies that big data architecture will both become more critical to secure, and more frequently attacked. Thus growing the list of big data security issues…
Furthermore, as more data is aggregated, privacy concerns will strengthen in parallel, and government regulations will be created as a result. More and more, the question “What is happening to my data, and where does it go?” will be asked not just in business and in government, but by everyday citizens worldwide.
Yet despite this, it’s hard to find security specialists who focus on big data security issues per se — largely because, historically, smart analytics and security haven’t always been ideal companions.
There is, however, a silver lining in the cloud. Just as smart analytics tools can drive new business strategies, they can also drive superior security — given enough of the right information from the infrastructure, and the right algorithms to process it. And that, in a nutshell, is the basis of the emerging field of security intelligence, which correlates security info across disparate domains to reach conclusions. The solutions available, already smart, are rapidly going to get smarter in the years to come.