Scalability is one of those words that can mean very different things to different people, even in the same context or the same project. It’s not so much nuanced as it is that the definition matters on perspective — scale can be different for different goals.
There will be upcoming posts on data virtualization, in-memory data grids, integration methods — all areas where an understanding of your current and future needs, resourcing, and loads are critical for planning. Going into those concepts, it helps to understand scale — not just “make it bigger,” but how you make it bigger and when and why.
Vertical and Horizontal Scale
Vertical scale and horizontal scale are probably the most traditional approaches to scalability. Vertical scale refers to adding capacity or improving performance by adding resources to an existing system. Horizontal scale refers to increasing capacity and performance by adding additional systems to the pool which then function all together.
It is easy to illustrate this through hardware scalability (though it is certainly not limited to hardware resources). You could make your web server perform better by adding more CPU and RAM or using a solid state drive. That’s vertical scale. Or, you could improve your web server performance by adding a second web server and using a load balancer to manage traffic. Cloning or duplicating a system is horizontal scale.
Vertical scale can start hitting upper limits because there is only so much that you can do to a single machine. Data or loads are still in one location, and it’s spread across multiple cores (for example) to distribute the load. Horizontal scaling is simpler and more dynamic because you just need to add systems to a pool, but that can start hitting inefficiencies as well, especially if the nodes are simply replicating data between each other or can encounter data mastering issues. As applications and loads become more complex, there are more limits on how effective cloning can be.
A variation of horizontal / vertical scale is to introduce data splitting, so certain types of data or certain operations are located across systems (horizontally) which are optimized for those types of loads (vertically). That’s the XYZ axis scale.
X, Y, and Z Axis Scale
Not all operations or applications need the same thing for high performance. One service may have high user traffic, which means it needs a system with high CPU and memory to be responsive. Another may have low traffic but need to store a lot of data, so it needs disk space or some kind of storage device. Even the same application — like a database — may need to be both high volume and large storage, but need that performance for different users at different times.
Instead of just adding more or better hardware or adding clones to a pool, delivering different types of data to different consumers can be done by handling that data in different ways.
Martin Abbott and Michael Fisher defined this as a scale cube in their book The Art of Scalability.
X-axis scaling is pretty much traditional horizontal scaling, which distributes the total load across a given number of nodes. Y-axis and Z-axis scaling, however, are two entirely different approaches by focusing on different things that can be scaled.
Y-axis scaling refers to breaking out and distributing services. This is called functional decomposition, and it is a design approach which is reflected in service-oriented and microservices architectures. This breaks apart things that are different based on those differences, and this allows an architect or developer to put a given service in a geographic location or on certain hardware that best meets its needs. Y-axis scaling can be a good option as part of planning application development (among a lot of other scenarios).
Z-axis scaling refers to data partitioning. It is a way of distributing data among many nodes or blocks as a way of improving performance. (Think of it as registering a child for camp, where the different registration boots are divided by surname, so A-F are at one booth, G-N at another, and so on.) The underlying data sets contain the same type of information, but any given partition only contains part of the information. This is a common approach for storage environments. Red Hat Ceph Storage, for example, uses erasure coding as data are distributed across blocks, with some data duplicated between blocks to provide resiliency. Red Hat JBoss Data Grid uses in-memory data grids to spread subsets of data across multiple cores, all in the systems’ RAM.
Other Things That Aren’t Scale But Still Matter
Scalability within a system tends to make it more performant (by increasing or better utilizing resources) and more resilient (by having multiple systems available to provide services). Because of that, discussions of scalability often touch on some concepts that are related, but aren’t really scalability: reliability, capacity, and performance.
- Reliability mainly means fault tolerance. If a node or service fails, reliability is defined by how well the system recovers or fails over to backup nodes.
- Capacity is how much load a system can handle, presumably within a certain range of acceptable performance. That could be memory or CPU, network bandwidth, write operations per second. It depends on what is being measured.
- Performance means how effective a system is at performing a task, which generally means how fast it can perform that task.
Reliability, capacity, and performance can improve, degrade, or change as an environment changes or even as you look at different aspects of the environment. Those terms are most accurate when the environment is at a specific state. Another word often associated with scalability is relevant here: dynamic. Scalability is reliability, capacity, and performance within a dynamic environment, meaning it retains those positive characteristics across a variety of states. It’s how effective the system is at handling change.
Scale In Action
Scale can mean different things, depending on your goals and your environment. There is a whitepaper that studies two JBoss Data Grid customers who implemented data grids within their environments to try to improve application performance. The key is how it highlights their goals and both the business and technical challenges they encountered as they were planning for scale.