Composability: The Future of AI Infrastructure

Written by Phil Harris, CEO at Cerio

In recent months, we’ve been hearing from operators of large data centers, including cloud providers, regarding the challenges of supporting the various types of AI applications they are running themselves or offering infrastructure instances to support their customers’ workloads. A uniform fleet of servers based on the current system design, with a limited set of “t-shirt sizes,” is just too restrictive.  
 
To be more specific, we’re at the point where AI infrastructure needs to be adaptable in terms of scale, agile in terms of resources (including accelerators, storage, and I/O) and efficient in terms of energy consumption. For that reason, we must rethink the way we approach building systems and the data center itself. 

For the last 20 years, we’ve used a general-purpose compute model for most enterprise applications, be it in the web tier for systems engagement, for business applications or systems of record and data processing. We have been able to get away with general-purpose systems as applications were still primarily optimized for, and by (mainly), standardized CPU instruction sets and available vendor-provided design points. They worked efficiently enough with sufficient availability and affordability to make it easy from an operational and procurement perspective.  

But now we need a new way of building systems to meet the requirements of AI infrastructure.  

Assembly

We can no longer use a generalized approach, the way we previously did, with a standard system model that applies to several different workload types. So the question is, how can we create the systems we need when we need them? There are two ways to do this:    

  • Specific, discrete configurations designed to accommodate large footprints of accelerator cores (i.e. tensor or raytracing), large memory bandwidth, extremely dense I/O and highly specialized power and cooling. Systems like this exist today, but at a significant price premium and usually at the expense of heterogeneity. As workload requirements rapidly evolve and the available acceleration technology becomes specialized, systems designed today will become obsolete well within the depreciation window of typical hardware. 
  • Composable Infrastructure, i.e., create systems you need in real-time based on available resources located within the same data center. This type of system can be easily “designed,” configured, and augmented as resources are deployed or as resources become available. The systems can then be torn down, or “deactivated,” making resources once again available to be reconfigured as necessary for the next workload. This means that we can procure the servers, accelerators, storage, and other components like TPUs, IPUs, DPUs, etc., that can accelerate specific AI applications at different points in time without affecting stability while continually improving performance.
     
    Economic advantage is one of the first things we should consider when looking at any transformational technology and composability is no different. If the choice or method used to build composable systems isn’t commercially or technically viable, there isn’t any point in doing it. This is why building the right “composable data center” requires careful consideration. From a technical perspective, one of the most important aspects of composable systems is allowing easy adoption and deployment of new and innovative resources in different technology domains, with better silicon diversity and more customization without a “hockey stick” increase in either complexity or supportability costs.   

So, how does one assemble systems in real-time? Let’s look at a few realities. Everything in the realm of computer science is a compromise between cost, complexity, and performance. Composability offers incredible advantages in terms of scale, agility, efficiency, and cost control but we must ensure we make the right compromises when taking this evolutionary step.   

As discussed, composability is about real-time assembly based on available resources and knowing what’s needed for the optimization of a given workload. Composability changes the procurement process. Previously we needed to define a “server” with all the required resources and peripherals, but one can now procure things as needed or as they become available.  
 
DevOps was a major change in the way application developers and system operations teams behaved, leading to more focus on rapid development and continual, iterative integration and delivery of code.  One of the models that quickly grew in importance was “infrastructure as code” (IaC), i.e., thinking of and managing infrastructure as reusable and disposable functions. IaC also meant that one could ensure there was little-to-no distinction between the resources used in a development, test or production deployment. This works great in abstracted models such as cloud infrastructure-as-a-service but requires a lot more standardization in on-prem deployments. Composability allows a consistent design approach that can easily be modified, improved, deployed and re-deployed; an approach that is at the heart of the IaC paradigm. 

Composability also allows us to think about very discrete capabilities that we want to bring into the data center for specific optimizations, which commercially are difficult to do when procuring systems at a fleet level. Standard system designs become problematic when it pushes out the ability to bring in new technology from smaller, specialized, or more diverse technology providers. With composability one doesn’t have to dedicate resources for a given application within a fixed server structure where those resources can only ever be used inside that system, leading to the problem of “stranded assets.” To overcome the stranding of assets leads to over-provisioning of resources to avoid not having enough resources or the right types of resources available when needed. The cost overhead of an approach that dedicates resources within a fixed server structure is very significant.  

Composability lets you truly decouple the dependency that resources have on fixed system models. There’s no economic inhibitor to having resources specific for a rarely used application co-mingled with resources for more mainstream capabilities. Optimal capacity planning and much faster adoption of new capabilities can occur more rapidly. Think of this as the democratization of the data center. 

This democratization is how composability promotes a larger accelerator industry. For example, at the International Supercomputing Conference, there were about 200 discrete accelerator vendors demonstrating their capabilities. The ability to discretely bring any of those accelerators into the data center and allow them to be composed into a system model is something we just couldn’t do before, both in the enterprise or with cloud providers. Composability makes this much more available to us, which will stimulate a lot of activity, innovation, and competition – and that is a good thing for both suppliers and customers. 

Agility 

In every evolution of data center compute, one of the important drivers has been to provide more agility. Through composability, agility means you aren’t fixed to one (or a few different) system models but you can create the system model you need when you need it. 

You can have an evergreen model that continues to bring innovations into the data center, while still standardizing the way the systems work in terms of the applications, drivers, and operating systems. This also significantly improves operational efficiency. For example, this could mean that one can now remove or add resources in real-time without large-scale disruption in the data center. Agility also means that one can modify the number of resources allocated to a workload based on demand. For example, in retail, there are higher peak utilization times than others, and you want to ensure the availability of the right resources, in real-time, to meet those peak requirements.   

Primarily, agility is about the ability to have a real-time environment in terms of how systems are built and how the technology is employed within those systems. But we can take it one step further. Think about how that allows you to have much smaller incidents of failures in the data center. Because now, instead of a failing GPU taking out an entire server – and all the tenants or other workloads running, we can minimize the impact of repairing or replacing that device. Today you’d have to take the entire system out of operation to change one component. Now, that component is composed, and can then be decomposed, upgraded, repaired, or replaced without the rest of the system being disrupted.  

Elasticity 

One of the most compelling aspects of the cloud is elasticity. One can select a certain size of a cloud-based virtual machine or virtual private data center, and if it needs to be scaled, there is elasticity in the parameters of that virtual machine or that cloud-based machine.   

We need that same elasticity in on-prem data centers.  And to do that, we need any required resources to be composed in real-time and at a system level, and we need to be able to add or take away from the existing capacity if those resources are no longer needed. That elasticity isn’t only about expanding, it’s about shrinking back when the resources are no longer needed, or when fewer of the resources are needed.   

Elasticity must work in a very stable manner so that applications, drivers, and systems can take advantage of that elasticity without causing operational problems. One of the reasons we’re doing composability on a data center scale is to ensure that we have consistency in the operational model, so that the way that servers operate today doesn’t change. The only difference is that now you can allocate resources based on availability and need.  

There’s work to be done to make the composability model graceful across the software stack as most drivers today don’t expect to see new resources added in real-time, or more importantly removed. But by working with a broad ecosystem of technology vendors, we can ensure that composability becomes a mainstream aspect of all data centers.  

Sustainability 

Going forward, sustainability is what we should all be thinking about when designing or optimizing new and existing data centers. The amount of power and methods of cooling is reaching a tipping point in terms of cost, availability, and global impact. 

Sustainability also includes thinking about how we can reuse technology. About 75% of all toxic e-waste on the planet comes from data centers. All too often, when new technology comes along, servers are no longer applicable and get disposed of because they’re from a previous generation, resulting in an unnecessary refresh cycle.   

Now, those refresh cycles will vary, but they are creating a larger consumption of resources, and the need to dispose of aged out resources that we can no longer use. What if we could continue to use those resources and extend their lifespan? What if we could be more efficient in the power that we consume across the data center?  

With composability, we can be more specific with how we design data centers and compose resources into a system versus overprovisioning and having resources sit idle, while still consuming significant power. It changes the whole model of not only how we build systems, but how we power them.   

Cooling becomes more straightforward because rather than having close and densely packed, high-power consuming devices, we can distribute them across the data center. We can go back to more affordable air-cooled systems. This reduces power consumption while normalizing and reducing power requirements. We can continue to reuse older servers with newer GPUs, further reducing resource demands and the amount of resource wastage.  

If we don’t change the way we currently build data centers with the system models we have, the cost and availability of technology will continue to be an impediment to a much broader range of data center operators.  

Composability is something we’ve been talking about for a long time, and until now, we’ve only had limited examples of the benefits to the general market. We’ve seen early, small scale deployments and some rudimentary attempts at limited scalability, elasticity, agility, operational efficiency, and sustainability. It hasn’t reached the point where it’s economically, operationally, and technically attractive to the market – but that will all change in 2024.   

This is the year where the technology for composability and the opportunity for composability will meet to deploy at data center scale across a broad range of use cases, across all standard server implementations, while utilizing all standard and available PCIe devices in the market.  

We’ve now got the ability to bring scalability, elasticity, agility, and sustainability to the market, with all the capabilities required to make it robust, secure,  performant and cost effective. It will reduce the current total ownership cost, improve overall operational efficiency of data centers, and support a broad range of use cases – something the industry has been wanting for decades, but hasn’t previously been able to achieve until now. This is the year of composability.

Cerio will bring this technology into the market in 2024.

Discover the future of data center transformation.

Learn more about the technology behind Cerio’s distributed fabric architecture.
Read the tech primer