Why Composability, Why Now?
Written by Phil Harris, CEO at Cerio
One of the major transformations in the last 10 years has been the movement of development and hosting of applications into the cloud. This allowed enterprises to focus on their core business and let somebody else take care of running data centers. As the saying goes, “friends don’t let friends build data centers.”
However, even cloud providers are challenged to deliver greater agility and service differentiation at a practical cost model in the era of AI workloads. Stranded assets and the ability to introduce new accelerators, memory and storage technologies effectively are driving up costs while leaving revenue or critical investments on the table.
At the same time, we’re seeing certain classes of workloads repatriated back into the on-prem data center. This is happening for a variety of reasons – data sovereignty, unique or custom processing or acceleration requirements, regulatory and policy requirements, and in some cases because the application types being developed are too expensive to run in a cloud, pay-as-you-go model.
Going forward the question becomes, how can we achieve the scale we need for AI without an unsustainable increase in cost and complexity? Well, to start with we need to scale in an elastic way so that for different workloads, we can apply technology and infrastructure as needed and accommodate changing demands (e.g., adoption rates or increases in data set sizes and the corresponding need for more accelerated processing in real-time).
Composability is critical to our ability to meet these challenges. It enables agility, scale and efficiencies that translate into fundamentally more effective use and availability of all resources in the data center. We can dramatically reduce the increasing cost and complexity of the design of data centers with respect to power and cooling, reduce the overhead of infrastructure needed to be provisioned, and have incredibly “fine grain” control of exactly what resources we need to deploy and use.
Let’s look at how Cerio, through composability, is changing the way we build and operate data centers.
Real-time system model
When thinking about how we currently define a system (usually at the time of purchase), we might use a vendor’s configure-to-order system to define a system based on various parameters, components and capacity, or we might pick from an existing set of pre-designed models. A systems integrator may then add more capabilities to the system before it is finally deployed. This can be a lengthy process, with limitations in what is available to choose from and the ability to customize the configuration without a significant increase in cost and time.
Composability is the ability to buy servers, GPUs, storage and other resources and attach them to a fabric. The fabric then allows the resources to be assigned into compositions that from the point of view of the server (BIOS), operating system, drivers and workloads are all part of a standard server. From that point on, until the composition is disassembled, the server runs as it would have been configured in the factory.
The critical difference is the scale of the server we can now build, the diversity of resources we can choose to compose, and the asynchronous nature of how and when we acquire different technologies and capabilities. Composability, when applied to all the servers and resources in a data center, is an incredibly efficient and agile way of using and re-using infrastructure. Doing this at the scale of the data center is where it becomes both technically and commercially viable.
Early attempts in this space have struggled to get past the “proof of concept” stage and have been limited in the scale of system that can be built without using non-standard/customized systems. The fabric needs to hide the complexity of the composition from the servers themselves, as well as the operators and support organizations that manage the data center.
Cerio provides a real-time system model that allows the user, operator or application to define what is needed in terms of core processing, memory, storage, and acceleration. That system model is built without losing any of the benefits of converged infrastructure, such as simplified management and strict adherence to well-defined protocols so that applications and drivers do not have to be changed to work.
Operational efficiency
Whether in the cloud or on-prem, current system models have a fixed design principle: cram more capabilities into a single system (server). Starting with converged and now hyper-converged infrastructure, this design principle has led us to reach the limits of what we can currently build and deploy.
With this model we have been forced to make some important tradeoffs. Without infrastructure-level agility, we have ended up over-provisioning for potential peaks in use, driving up costs with critical resources remaining idle for extended periods, and stranding resources that could be used to augment workloads but are not available to the system that could use them.
The other limiting aspect is the ability to introduce new capabilities. Once a server is deployed with whatever resources it’s equipped with, the ability to update, replace or alter the system’s configuration is complex and costly to the point where it is often cheaper to just buy new servers.
AI is forcing a radically different pace of innovation and driving the development of new capabilities needed to augment applications – either new memory, accelerator, storage, or processing technology. The existing system model and the demands of the data center going forward are diverging in their compatibility.
One of the benefits of composability is being able to assemble a system based on the workload requirements at any time. This needs to be done without changing operating systems, drivers, or the workloads themselves, and we need to be able to use off-the-shelf hardware without modification.
From an operational efficiency perspective, the ability to assemble systems in real-time eliminates the need to take a server and all its resources out of a service to upgrade, replace or add capabilities. In multi-tenant or multi-occupancy systems, a single failure or upgrade has a potentially huge blast radius, affecting many unrelated users or applications.
If we compose a system, the ability to de-compose an individual component and upgrade or replace the component, then re-compose the specific resource can be done with minimal disruption and significantly lower support costs.
The resources are used for the duration of the workload, and then returned to the resource pool to be immediately available for composition into the next workload, rather than letting them sit idle.
Increased power efficiency
The operational cost of a data center varies depending on the scale and types of applications you’re running, but servers alone represent roughly 25% of the overall total cost of ownership. A similar amount goes into power and cooling to deal with the heat that’s generated to keep systems operating effectively. This translates into both a cost and an overall resource sustainability problem.
After the typical 3–5-year server depreciation window, servers will often be relegated to less performance-intensive applications or disposed of completely. The materials used to build servers and other resources are contributing 75-80% of the total e-waste on the planet because their reusability is so minimal. Clearly, there’s a large sustainability impact in continuing to run data centers the way we have for decades.
Since power consumption (and associated heat generation) is a function of a server’s components and capabilities needed to run workloads effectively, the more exactly you can define those resources and the amount of time they are consuming power, the better. For example, over-provisioning uses more power than needed, especially since servers must draw a certain amount of power even when they aren’t doing anything.
By composing the systems you need in real-time, you can be more accurate about the power consumption and the actual resources required for the workload. When that requirement is no longer there, you can decompose those systems, release those resources, and put them into much lower power states, only drawing power as needed.
Cost-effective cooling
The more we converge platforms with more heat-generating resources, we are creating massive hotspots in the data center. We currently have three cooling methodologies:
- Air – use fans to blow cool air over hot systems.
- Direct liquid cooling – apply heat transfer plates/pipes that run solvents or other fluids that move heat away from the physical server components. This requires dealing with that heat somewhere else.
- Immersion cooling – take large baths of solvent, where we can immerse the server or components to dissipate and remove the heat. However, these systems become more costly as they become more elaborate, including expensive mechanisms to allow for servicing and managing immersed systems.
We need to look at cooling from a whole data center perspective. If we change the operational model to a distributed resource model through composability, we can evenly distribute power consumption across the entire data center. It’s no longer limited to just a rack of heavy, power-consuming devices that are likely to define the entire cooling strategy for the data center.
It’s very difficult to have mixed cooling strategies in the same data center, so it’s common practice to use one cooling strategy that works for as much of your infrastructure as possible. By reducing the overall power utilization and distributing heat generation using air cooling, it brings the cost of the data center down and the reliability of the data center up, because we don’t have to use sophisticated but risky cooling mechanisms.
Simplifying integrations
In the last 20-25 years of building out data centers, we have built up a set of widely used information technology service management (ITSM) platforms and other provisioning, operational, analytical and visibility tools, many of which are delivered by software vendors.
Then came along DevOps – an integration of tool chains into the way we build, develop, test, roll into production, and then run applications.
When changing the way we build systems, we need to be able to use the existing system model tools. This means integrating into different tool chains where there are both proprietary and open standards around the interfaces/APIs. Some well-used tools like Ansible and Terraform are the predominant infrastructure as code (IaC) toolchains that enterprises use for on-prem and cloud-based infrastructure management.
Central to the IaC model is the concept of reusability and disposability, so that resources are only used and allocated as needed. The composability model needs to seamlessly integrate into both traditional ITSM and IaC-based DevOps tools that are critical to how we develop and operate data centers and applications.
Discovery and publication of available systems and compatible resources need to be modelled in a way that exposes new methods and opportunities for efficiency while maintaining existing constructs for the (now) composed systems. Cerio has developed a middleware layer that provides a fully ACID/transactional operational environment. This ensures reliable execution and visibility into the state of the data center and all the resources and workloads deployed.
The journey into the cloud provides valuable insights into building data centers at scale, but now it’s time to bring the benefits of the cloud into the on-prem data center. Fortunately, composability makes this possible. With Cerio, it’s now extremely easy for data center operators to use the many benefits of this new technology in real-time and without disruption to their existing systems, tools and processes.
In my next blog, I’ll dive into what to expect for the future of AI infrastructure.