How to make the most of Big Data
Mick James gets to grips with the "three Vs" to pinpoint how best to find and protect the small data needle in the big data haystack
We live in a world where the quantities of data being produced defy the imagination: the Large Hadron Collider generates 30 petabytes of data a year – the equivalent of more than six million DVDs – in a search for just a handful of interesting events that could change our view of the universe. But businesses are not far behind in the volumes of data they deal with. This growth, with the falling cost of storage and increasing processing power, offers the promise that "Big Data" can yield insights that will drive innovation and competitive advantage.
"It's a question of how you find the small data needle in the big data haystack, how to distinguish between the mass of information that they are generating and that people are generating about themselves," says Paul Connolly, head of the think tank at the Management Consultancies Association. "Where are you really able to provide some differentiation, rather than just mapping a lot of stuff?"
It's hard to pinpoint exactly where the supply of information went from famine to feast, but as Harvey Lewis, head of research at Deloitte Analytics, points out, there are papers from the 1960s on techniques for dealing with the "information explosion".
"Very few of the things we are tackling now are genuinely new," he says. "Data volumes are increasing exponentially, but the thing is that no matter where you are on an exponential curve, it looks like you are facing an exponential increase."
Big Data and the EU
The European Union's Data Protection directive has been seen as a threat to Big Data, particularly in the restrictions it places on the use of publicly available personal data.
However, the EU argues that it will in fact strengthen the contribution of Big Data by making the EU the most trusted and legally protected digital marketplace.
Data portability – the right of citizens to move personal data from one provider to another – will encourage start-ups. On the other hand the directive will discourage automated profiling, which can lead to "discrimination, exclusion and loss of control". Some banks are already using social network data for credit scoring, but now organisations will have to be more transparent about how they use personal data and individuals will have a right to erase certain data and "be forgotten".
The three V's of Big Data
Formal definitions of Big Data vary, but the most commonly used is the Gartner Group's: "…high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information-processing for enhanced insight and decision making."
Summed up as the "three V's" this captures the insight that it is not simply the amount of data or the speed at which it becomes available, but the fact that the same entity – such as a customer – will be present in many different data sets, even within a single organisation.
This is where the early benefits of Big Data have been felt. Retail, banking and financial services organisations see the opportunity to pull information out of different silos to create a "single view" of the customer. In its Big Data Alchemy report, Capgemini noted that where this approach has been adopted it has led to greater conversion rates of prospects, greater "share of wallet" from existing customers and lower attrition rates.
Tackling customer churn shows the "three V" approach in action: one bank analysed 200 variables across two million customers to predict which ones were likely to switch banks and was able to target them before they had taken any action.
The ability to look at everything and everyone, rather than sample sets of data is a feature of Big Data. Deloitte's report on retail banking networks, Bricks and Clicks, looked at multiple factors affecting the use of 10,400 bank and building society branches in England and Wales.
The results were counterintuitive; the fastest adopters of online technologies were older people and the most vulnerable branches were in retirement areas. Conversely, in towns and cities, the expectation was that younger people would not need branches, but in fact they were at the life stage of buying mortgages and investments where they needed a face-to-face conversation.
"Because we looked at all the data we were able to create insights that wouldn't have been obvious if we were looking at a single brand," says Lewis.
There are also security and data protection issues, however. Bringing together a mass of information in one place creates a potential goldmine for hackers, and there is a much larger "attack surface" because of the number of exit and entry points.
And there are more subtle threats, according to Scott Sinclair of KPMG's cyber security advisory team. He says: "An important question – at least for those organisations that do not have a dedicated analytics function – is that whereas it is clear who owns the input data and who owns the outputs, which bring together data from many different silos is not so clear. The output from the analysis is a much more sensitive data set, but less mature organisations may not be sure of the data that is being produced and are perhaps not protecting it in the right way."
On the plus side, Big Data is now playing a vital role in security: for example, insider threat detection has typically relied on detecting anomalous behaviour, something that is becoming ever harder to define.
"These tools generate so many false positives and background noise that it becomes very difficult to identify the real [criminal] insider activity," says Sinclair. "By bringing in data sets that weren't considered… you can give a much more accurate picture of what that user's behaviour normally looks like and how it deviates from the norm."
As well as helping the organisation detect and predict threats such as malware outbreaks and hacking attempts, the data collected can act as the foundation for other Big Data initiatives.
Sinclair says: "Rather than just running the security platform doing the same thing everyday, you can engage an analytics function, data scientists who can start to look at data and understand what it means from a business context, start to do the freehand work to see where other parts of the business can benefit."
This is an important principle, he says: "You need to begin with clear objectives about what questions the business is trying to answer and then use the data to address those questions. Once you've got those data sets then you can say, here's some other stuff we can start to develop. If you just grab all the data and look for something interesting you will fail."
ICAS has a range of tools and resources to assist in keeping data secure. Find out more about the ICAS cyber security framework (login required).