Gartner has a term for information which is routinely gathered, but not really used: dark data. This is information which is collected for a direct purpose (like processing an online transaction), but then never really used for anything else. By IDC estimates, dark data represent about 90% of the data collected and stored by organizations.
The Internet of Things (specifically) and digital transformation (more generally) are business initiatives that try to harness that dark data by incorporating new or previously untapped data streams into larger business processes.
Big data refers to that new influx of data. The “big” adjective can be a bit misleading — it doesn’t necessarily mean that these are massive amounts of data. Some organizations may be dealing with petabytes of data, but some may only be gigabytes. It’s not a given amount of data, but rather the scale of increase from previous data streams.
Regardless of the absolute amount of data, big data has several consistent characteristics. IDC originally noted the 3 Vs of big data; a recent blog post from Impact Radius expanded that to seven. For a general definition, I think about five of those Vs are relevant:
- Volume (a large number of transactions per second)
- Velocity (frequent transactions)
- Variety (different data formats from different sources)
- Variability (unpredictability in the data being returned)
- Veracity (the accuracy of the data, either coming in or in cache)
That’s the what of big data. While in IoT environments, there could be new connected devices, services, or mobile applications, an IoT initiative is not a requirement for a business beginning to access and use its dark data. You don’t need new data streams for big data to be valuable; big data, as a strategy, can mean taking existing data streams and extending their utility.
It’s data with a purpose.
For example, an ecommerce application may store information about items still in a shopping cart. It could be that all that data was used for before was to preserve that shopping cart for a future visit. However, a different approach could treat that as a new data stream, and use to for a support contact, to make other product recommendations, or to look at overall shopping patterns for particular items.
Data takes on a new approach. It is still immediate — for online transactions or real-time analytics — but it also has both historical and predictive applications. McKinsey, in a popular report on big data, identified five different ways that big data can provide value. The first is the broadest, simply making information more readily accessible. The next progression is performance information across a variety of different areas, from inventory management to personnel issues to user experience. The next is creating targeted customer information through more effective market segmentation and understanding. The last two are predictive, both in developing products and services (based on customer and market patterns) and in creating better business decisions and executing business logic.
From an architectural standpoint, big data requires a technology platform that allows integration and automation at the data layer. Integration allows data from different sources and in different formats to be used transparently; this can be done by adding an extra architectural layer for an in-memory data grid, through an integration platform like Red Hat Fuse, or even through API management.
Automation applies defined business logic automatically, which makes the overall infrastructure more responsive. While that responsiveness can be directed externally through customer interactions, logic can also be applied internally for process automation, event processing, and analytics. This would use a BPM or business rules engine in conjunction with the data layer.
Big data is not a technology; it is a strategic tool. It is a way of approaching data as an asset. And the benefits could be huge; McKinsey estimated retailers could increase their margins by 60% and industries like heathcare could introduce efficiencies saving hundreds of billions of dollars a year.
- Go Big and Fast or Go Home: Data Grids Meet Data Virtualization in Modern Data Architectures (Red Hat-sponsored analyst report)
- Fast, scalable, highly available applications (technology datasheet)