The Evolution of Blade Systems: From Web Tiers to AI-Ready Infrastructure 

data-center

History of blade systems 

When enterprise applications were first being deployed in data centers, the application design was constructed in three-tiers: 

  1. Front-end web tier: where users interact with an application.  
  1. Business logic layer tier: where the business logic for the application resided – be it CRM, or enterprise resource planning etc. 
  1. Data tier: where the data management resided for whatever application was running.  

These three-tiered applications (multi-tiered applications) started becoming very desirable in the data center because the tiered approach provided the ability for each tier to be scaled independently based on demand.  This enabled the scaling of web technologies and applications in a manner that had not been possible previously. 

As we shifted to compute platforms in the data center, it became easier to define and specialize in the infrastructure tiers needed to support multi-tiered applications. 

  • The front tier hosted the web browser or system of engagement that interfaced with the application’s users. 
  • The middle tier consisted of larger application servers that handled business logic and required more memory, particularly for tasks like online transaction processing or in-memory database operations. 
  • The back-end tier managed the databases and handled data storage and retrieval. 

Where do blade systems come into play? 

As application requirements grew, the need for high-density web front ends increased, demanding more individual compute capacity at the front of the application tier. Blade servers were well-suited for this need. They offered a compact, efficient design—typically based on standard one- or two-socket systems—that could be vertically mounted in a chassis, optimizing space and scalability compared to traditional horizontally mounted servers. 

Traditional rack-mounted data center servers were designed with a horizontal orientation, allowing more individual compute nodes to fit within the same rack space. By housing these nodes in a common chassis, organizations could centralize management and consolidate networking and other I/O resources, improving efficiency and scalability. 

Some companies went further by directly integrating traditional networking into the blade system. That allowed them to have an even tighter integration between the blades themselves and the outside networking resources that attach to the other tiers in the data center. Over time, blade servers got more powerful as socketed CPU performance improved. 

Often, they would be running virtualized environments as well, where VMs were running on top of blade systems. But the problem with blade systems is the space constraint. With a very large number of compute nodes in a vertically oriented chassis, it doesn’t really allow for any physical room for other components. 

For example, the airflow requirements to keep blade systems cool requires a very clear path from an airflow perspective. With traditional components on blade servers, it made it hard to add new technologies, like GPUs. 

Blade servers don’t have the same space on their motherboards as traditional rack servers. There’s no riser slot to plug in third-party devices like GPUs, so we need a different way to add those kinds of resources to blade systems. 

Some vendors addressed this by reducing the number of compute blades in a chassis and using the extra slots to add GPU carriers—allowing up to two GPUs per blade. However, this approach reduces compute density and increases overall costs while only providing limited acceleration. It falls short of delivering the level of performance needed for the wide range of applications you might want to run on blade systems. 

That’s why we need a new approach to integrate not just GPUs, but also DPUs, other accelerators, and additional infrastructure into blade systems. 

Composability is the perfect solution to that problem.  

AI inference and today’s applications 

Blade systems can benefit from GPU augmentation for applications like AI inference, where pre-trained language models power specific use cases. 

These systems need to be deployed to end users through a layer of inference where the interaction between users and those models occurs. Blade systems are perfect for this type of application due to their high compute density. They can support a large number of compute nodes on the edge of an AI system, which can be easily enhanced with GPUs to accelerate the AI inference layer. 

So, how we bring more GPUs to blade systems for AI inference is critical.  

One way to achieve this is by scaling GPUs on a per-blade basis to meet the specific needs of an AI inference application. Users can choose different types and densities of GPUs for a blade system, depending on the workload. This approach provides greater flexibility and scalability, both at the hardware compute layer and in the AI inference layer through the dynamic integration of GPUs, allowing customers to maximize the use of their infrastructure and resources. 

Why composability? 

Composability offers the flexibility to attach the right scale and type of GPUs, NVMe storage, or other resources needed for your system.  

Trying to achieve this in a traditional blade environment would be challenging, both in physically attaching resources and in limiting the number of compute nodes. This would make it impractical and commercially unfeasible to continue using blade systems. 

Denser blade deployments are now possible because valuable space in the blade enclosure is not required to add new resources like GPUs, NVMe storage, or other accelerators. As new classes of accelerators come into the market, the ease of adoption of those GPUs (which is the same as for classical servers) now applies to blade systems. This extends the life and commercial viability of the blade system, giving customers more choice around how they deploy those resources in the data center. 

Why Cerio? 

Cerio believes that every and any server in the data center should have the ability to leverage any GPU at any time. Integrating blade servers into this capability set allows every type of server deployment in the data center to benefit from GPUs, enhancing and accelerating the applications running on them. We’re no longer limited to a specific type of server. 

Composability makes this flexibility available to all servers, including blade systems. As servers and GPUs become more interdependent, the ability to scale using blade systems will be a crucial attribute for any data center moving forward. 

What’s Driving Digital Transformation?

IDC Analyst Alex Holtz explores the trends and technologies changing the Media & Entertainment industry.
Download the IDC Research Brief