Explainer Video: Cerio Composability Platform

AI workload diversity makes it difficult to serve all classes of applications with a single system design, putting unprecedented pressure on infrastructure built on a static system model. AI applications run on dedicated accelerators like GPUs that need to be deployed inside dedicated, specialized servers that drive up operational cost and stranded resources.

Composability provides a dynamic and cost-effective foundation for agile infrastructure built to support AI acceleration and diverse application environments. Systems can be composed in real-time and optimized for their applications – GPU mobility to move resources where they are needed, GPU density matched to processing requirements and full heterogeneity to add any PCIe device to the resource pool.

VIDEO: Cerio CTO Matthew Williams discusses the use cases for agile infrastructure and the design principles of the Cerio Platform for achieving composability at scale.

Video transcript

Hi, I’m Matthew Williams. I’m the CTO at Cerio and want to talk to you about the future of AI infrastructure. AI has an infrastructure problem. While generative AI is getting all your attention right now, the infrastructure itself is a real underlying challenge. AI applications run on dedicated accelerators like GPUs, and in the current 30-year-old system model, these GPUs need to be deployed inside dedicated, specialized servers. 

These servers are more expensive than standard servers, and needing a new variant of server creates many challenges for IT operations. As the AI applications are running on the GPUs, the CPUs within these servers are essentially idle, leading to expensive, wasted cost and power.  

With this old system model, each GPU server supports a fixed number of GPUs. Typical GPU servers only support 4 GPUs with some very specialized servers supporting 8 GPUs. However, for each AI application, a different number of GPUs and a different ratio of GPUs to CPUs is required for optimum performance. As the number of GPUs and the GPU to CPU ratio are fixed within GPU servers, and rarely optimum for the application, GPU and CPU assets become stranded and unavailable to other applications, leading to significant underutilization. Not having enough GPUs available to run applications then leads organizations to overprovision, driving even more stranded assets across servers, CPUs, and GPUs. 

In many conversations with hundreds of organizations over the last 18 months, there are three main use cases that people talk about for agile infrastructure. 

The first use case is GPU capacity. The stranded assets and underutilization caused by the 30-year-old system model is the primary set of problems raised. This is across every vertical, every scale, and every type of organization. Having the right number of the right type of GPUs available to standard servers is the number one ask. 

The second use case is storage agility. Most storage use cases have moved or are currently moving to NVMe drives. This is driving the adoption of large scale NVMe deployments for both boot and tier 0 data. As with any device in the data center, NVMe drives fail. When an NVMe drive needs to be replaced, a technician needs to be sent down the aisles to replace the failed drive. 

In some cases, this also requires the server to be powered down. Replacing an NVMe drive can take hours from the time the failure is first detected. And until a failed drive is replaced, the data redundancy model is compromised, putting key data at risk. Replacing failed drives like NVMe, replacing failed devices like NVMe drives, or even just adding capacity for software without requiring physical intervention, is almost always part of our conversations. 

The third use case is about control and cost management. Data center operators need to have the flexibility to meet their end users demand for a new AMD, NVIDIA, or Intel GPU, or any type of focused accelerator. Or, suddenly operators will have a requirement for five times as many GPUs as they had originally planned, and don’t have a budget to constantly buy new servers to put them in. How can they possibly keep up with dynamic demand with a static budget cycle?  

This diagram shows Cerio’s agile infrastructure from a high-level system point of view. On the left side is the physical architecture, starting with servers containing a serial fabric node. You can deploy more than one fabric node in a server, but we’re showing one here for simplicity. Hundreds of servers can be supported in the same fabric.  

The lower part of the diagram shows the device enclosures containing at least one fabric node, and the GPUs, TPUs, NVMe drives, or any other type of PCI Express device needed by the end user.  

In the middle is the SHFL, providing passive connectivity between the server hosts and device enclosures. 

The Cerio Fabric Manager discovers all these physical devices and provides a logical view of the available inventory northbound with a service management layer in a way that is consumable by standard tools such as Ansible or Terraform.  

On the right side, you can see the logical view with a pool of servers and resources that are available for composition to a server. The flexibility of our approach makes it possible to compose any resource to any server in the data center at any time. That makes true heterogeneity achievable and significantly reduces the cost of building and running infrastructure with standard components. And because we adhere to standard PCI Express, there’s nothing new to manage within the server. No disruptions to existing server operational environments, and fewer server variants to manage.  

Together, these capabilities deliver the agile AI infrastructure that is needed for modern data center applications. Through GPU optimization, users can create servers with the right ratio of the right types of devices on demand. 

With resource agility, PCI Express devices such as NVMe drives and accelerators can be easily deployed, scaled in place through software-based assignment and replacement when device failures occur. New types of devices can be added to device enclosures at any time, enabling new, innovative technologies to be quickly adopted and for end users to have access to the right set of devices instead of being locked down to a fixed set of resources. 

And finally, dynamic scale allows the number of devices within each server to grow on demand, and for the number of servers and devices to easily scale as use cases expand as budgets permit. 

Here we’re showing an example where three different applications each wanting a different mix of resources. For existing IT service management platforms and the Cerio Fabric Manager, each application can access the right number of the right types of devices for optimal application performance, regardless of where the servers and devices are physically located within the data center. Once an application is run to completion, the devices used by that application are freed up and are now available for any other application, ensuring that no device ever becomes stranded.  

Cerio’s composability platform delivers on true data center scale agility, delivering industry leading economics and scale for AI applications. Based on our customers’ use cases, we see a very significant decrease in the number of servers needed for AI applications, a dramatic improvement in total cost of ownership, and a much more sustainable AI infrastructure.  

For more information on the Cerio Composability Platform, please visit our website at cerio.io. Thank you. 

Discover the future of data center transformation.

Learn more about the technology behind Cerio’s distributed fabric architecture.
Read the tech primer