Some of the biggest technology trends aren’t necessarily about doing something new. Things like cloud computing (as an environment) and design patterns for the Internet of Things and mobile applications (as business drivers) are building on existing conceptual foundations — virtualization, centralized databases, client-based applications. What is new is the scale of these applications and the performance expected from them.
That demand for performance and scalability has inspired an architectural design called distributed computing. Technologies within that larger umbrella used distributed physical resources to create a shared pool for that service.
One of those technologies is the purpose of this post — in-memory data grids. It takes the concept of a centralized, single database and breaks it into numerous individual nodes, working together to create a grid. Gartner defines an in-memory data grid as “a distributed, reliable, scalable and … consistent in-memory NoSQL data store[,] shareable across multiple and distributed applications.” That nails the purpose of distributed computing services: scalable, reliable, and shareable across multiple applications.
A Look at a Distributed Architecture
Distributed computing is a very broad term that covers many different technologies and services, but a basic definition is that a particular service is located and shared among several servers, in a pool. Both frontend and backend applications interact with that pool, rather than any single server instance, so that pool can be expanded or contracted dynamically without affecting any application.
There is a subset of distributed computing called in-memory computing. More traditional architectures use data stores which have synchronous read-write operations. This is great for data consistency and durability, but it is very easy to bottleneck if there are a lot of transactions waiting in the queue.
There have been significant advancements in computer hardware, especially store devices (like solid state drives). It’s proportionally cheaper to have a lot of storage capacity now than it was a few years ago, and the hardware quality is better. Additionally, there are changes in operating environments (e.g., cloud) and business initiatives (Internet of Things) that are pushing for highly responsive, data-rich applications.
In-memory computing adds an additional layer within an environment, which uses the random access memory (RAM) on the physical systems to house most or all of the data required by client applications. Many (though not all) of in-memory computing technologies are related to data, including data grids, complex event processing, and analytics.
With a data grid, that layer is in between the application and the data store. In-memory data grids use a cache of frequently accessed data in that active memory and then can access the backend data store as needed and even asynchronously to send and receive updates.
Using the data grid moves data closer to the endpoints where users interact with it. This increases responsiveness and can lower transaction times from hours to fractions of a second.
Advantages and Uses
TechTarget defines three attributes for when data grids are most advantageous: velocity, variability, and volume. In-memory data grids are best suited for environments where there is a lot of data (volume) coming in simultaneously or continually (velocity) from different sources or formats (variability). Another way of saying it is performance and scalability.
From an architectural perspective, scalability and performance are met directly:
- Dynamic, horizontal scalability, based on service load without affecting either application or backend database configuration.
- Large-scale transaction processing (hundreds of thousands per second) in a distributed system that is fault-tolerant.
- Cloud-native architecture, which is interoperable across different environments (on-premise, hosted, cloud).
There are some less immediately obvious advantages because of how data grids interact with both data sources and applications.
Another way to look at the data grid is to treat it as an abstraction data layer, sitting between multiple data streams and data storage devices. In some environments, a data grid could be used as an integration method for multiple data backends.
Changes in architecture, such as microservices, are also changing how environments can ingest and respond to changing data. In more traditional architectures with discrete applications, the workflow can be very sequential — first you receive data, then you store it, then you retrieve it, then you run it through an analytics program, then you take those analytics (usually in graphs and charts) and overlay it on some business logic. It is possible to cut out some of those intermediate steps. For example, the same data streams could be used for real-time information and also for data analytics — and those analytics could be fed directly from the data grid into a given application (like a BPM engine) according to defined queries. There’s no need to break out analytics as something separate; it can be part of the application.
There are some drawbacks to in-memory data grids (as with distributed computing generally): increased complexity, a lack of skilled engineers familiar with the technology, and a lack of standards. If data access and responsiveness are not critical for a specific application, these disadvantages may point to more traditional solutions.
Still, in-memory data grids offer an important tool that can help realize emerging digital transformation initiatives because it helps that critical data layer handle the velocity, volume, and variability of modern data streams.
- Go Big and Fast or Go Home: Data Grids Meet Data Virtualization in Modern Data Architectures (Red Hat-sponsored analyst report)
- Fast, scalable, highly available applications (technology datasheet)