CXL picks up steam in data centers.

CXL is gaining traction within large data centers as a way to increase the use of various compute elements, such as memories and accelerators, while reducing the need for additional racks of servers. But standards are being expanded and changed so rapidly that it is difficult to keep up with all the changes, each of which needs to be verified and validated across a growing body of conflicting and often customized designs.

At its core, Compute Express Link (CXL) is a cache-integrated interconnect protocol for memories, processors, and accelerators, enabling flexible architectures to more efficiently handle different types and sizes of workloads. makes This, in turn, will help ease the pressure on data centers to do more with less, a seemingly overwhelming challenge with the explosion in the amount of data that needs to be processed.

In the past, the typical solution was to throw more compute resources at any capacity problem. But as Moore’s Law slows down, and the amount of power needed to power servers and cool racks continues to grow, system companies are looking for alternatives. This has become even more important as power grids hit their limits and societal demands for increased sustainability.

Developed largely by Intel, and based on the PCIe standard, CXL offers an attractive proposition between these conflicting dynamics. Optimizing the way a data center uses memory can increase performance, while also reducing stack complexity and system costs. In particular, CXL allows for low-latency connectivity and memory synchronization between the CPU and memory on connected devices, keeping data consistent across regions.

This is especially important for high-throughput workloads, such as AI training, where more data typically equates to increased accuracy, as well as for increasingly electrified vehicles, smart factories, drug discovery, and For the large-scale simulations required for climate mapping. Some.

The CXL consortium, formed in 2019 by founding members Google, Microsoft, HPE, Dell EMC, Cisco, Meta, and Huawei—introduced the first version of the specification based on PCIe 5.0 this year. Since then, AMD, NVIDIA, Samsung, Arm, Renesas, IBM, Keysight, Synopsys, and Marvell, among others, have joined in various capacities, and combined Gen-Z technology and OpenCAPI technologies. In August, specification 3.0 was introduced with double the bandwidth, support for multilevel switching, and concurrency improvements for memory sharing.

As far as standards go, it’s a very fast developing one. And given the base of support for CXL from deep-pocketed companies, it seems likely that this standard will become widespread. But its rapid evolution has also made it difficult for IP developers to pivot quickly from one version of the standard to another.

The boom scene
“We should be looking at this very strongly in the next few years,” said Arif Khan, group director of product marketing for PCIe, CXL, and Interface IP at Cadence. He noted that the addressable market for CXL-based applications is expected to reach $20 billion by 2030, according to some memory maker forecasts.

Others are similarly optimistic. “Many customers are adopting CXL for their next-generation SoCs, accelerators, SmartNICs and GPUs, as well as memory expansion devices,” said Richard Solomon, technical marketing manager for PCI Express controller IP at Synopsys.

“Almost everyone is building their servers with CXL capability,” said Brigid Asay, senior planning and marketing manager at Keysight Technologies. “Standards such as JEDEC have agreements with CXL to work between standards and ensure interoperability. CXL has also acquired assets from Gen-Z and Open CAPI, which were offering the same capabilities as CXL, but CXL had the staying power.”

Still, widespread adoption will take time, no matter how quickly the standard develops. Despite the attraction of shared resources, data centers are conservative when it comes to adopting any new technology. Any malfunction can cost millions of dollars in downtime.

“While there is a lot of excitement around CXL, the technology is still in its early days,” said Jeff DeFalpi, senior director of product management for the Arms Infrastructure line of business. “To be widespread, the solution must go through a rigorous functional and performance validation process before seeing production deployment with OEMs and cloud service providers.”

Varun Aggarwal, senior staff product marketing manager at Synopsys, observed that numerous memory and server SoC companies have expressed support for CXL over the past three years. But bringing products to market that fully support the CXL topology and bandwidth is a slow process. “More and more designs are now choosing to adopt CXL through cxl.io for their PCIe data paths, with an eye toward expansion into other types of devices. CXL adoption in data centers has been slow in terms of product rollout. is, and one of the reasons for this is the lack of authentication and validation infrastructure.

Aggarwal noted that the user community is increasingly looking to CXL transistors, virtual models and host solutions, in-circuit speed adapters, and interface card hardware solutions as their first application. “CXL exemplifies a software-first approach for companies that want to kick-start hardware-software validation, bring software, and achieve their time-to-market goals in parallel. “

System level authentication is also a requirement. “Depending on the features supported, validation can span memory features such as resource sharing, pooling, and expansion; synchronization between hosts and devices; security and routing; hot-remove and add; with different virtual hierarchies. multiple domains; and correlated performance — especially latency for .cache and .mem,” Agarwal explained.

Popular attributes
So why is CXL being adopted despite these differences? Synopsys’ Solomon said CXL’s initial focus was on cache synchronization, and the industry was interested in its asymmetric synchronization protocol. It was only later that attention turned to overcoming the limitations of conventional memory attachment and a DRAM interface.

“Now you’ve got this caching approach and this memory is connected, and each of those are driving CXL in different ways in the data center,” he explained. “For AI and machine learning, SmartNICs, data processing units, add-on devices for servers that focus on intelligently dealing with data in the server rather than the host CPU. They’re really interested in a cache-synchronous interconnect. For hyperscalers, CXL creates a separation between processor and memory that allows for more efficient allocation of resources between jobs that require varying amounts of volatile and non-volatile memory.

Additionally, being able to meet the low latency, interconnect, and memory rating requirements for data center applications makes CXL attractive,” Agarwal said. “Since it uses the existing PCIe PHY layer does, interoperability helps early adoption and extends the product life cycle.”

This makes CXL ideal for data center applications. “CXL provides cache synchronization for CPU access to memory,” Keysight’s Assay said. “It also enables pooling of memory resources, which is ideal because it increases overall DRAM utilization in the data center.”

While CXL has multiple use cases, Arm’s Defilippi said cloud providers are highly optimistic about the ability to share memory capacity across a set of nodes and increase GB/vCPU for key applications. “It’s cost-prohibitive for cloud vendors to fully provision DRAM across all their systems. But by accessing the CXL-connected pool of DRAM, they can now take systems with just 2GB/vCPU and assign additional DRAM capacity.” can, making these systems more suitable for a wider range of workloads. For systems that are already heavily provisioned (ie, 8GB/vCPU), additional CXL-attached memory can now be applied can make it suitable for applications that require large memory footprints, such as some ERP systems, that cannot run in the cloud today. becomes a gateway.

The November 2020 release of CXL 2.0 introduced memory pooling with multiple logic devices, which Cadence’s Khan said was a key improvement to the specification. “This pooling capability allowed the sharing of resources, including system memory, across multiple systems. While CXL was designed for accelerators, it also supports memory interfaces. Tiered configurations can also support heterogeneous memories. are — high-bandwidth memory on the package, fast DDR5 attached to the processor, and slow memories on the CXL module. Memory is a significant cost item for data centers, and pooling is an efficient way to manage the system.”


Figure 1: CXL 2.0 introduced memory pooling with single and multiple logical devices. Source: Cadence

CXL and custom
Last year’s introduction of CXL 3.0 takes things a step further with a fabric-like implementation of multi-level switching. “This allows the implementation of Global Fabric Attached Memory, which separates memory pools from processing units,” Khan said. “Memory pools can also be contiguous with different types of memory. In the future, we can envision a leaf/spine architecture with addresses for NIC, CPU, memory, and accelerators, with An interconnected spine switch system is built around the CXL 3.0.”


Figure 2: CXL 3.0 offers a fabric-like implementation with multi-level switching. Source: Cadence

This is relevant for data centers because there is no one-size-fits-all system architecture in the AI/HPC world.

Khan explained that today’s servers provide a reasonable superset of what these applications might need, often resulting in lower utilization and energy waste. “Heterogeneous applications call for very different solutions for optimized implementation. Typical application workloads for HPC/AI/ML each have different system requirements. The vision for separate systems is to build large banks of resources. : memory, GPUs, compute, and storage resources to build flexible, composable architectures as needed. In other words, CXL enables these features to pave the way for discrete and composable systems. Is.”

CXL’s memory model also opens the door to new custom CXL devices, such as pooled memory controllers.

“Another emerging use case will be heterogeneous computing, using cache coherency within CXL devices for memory sharing between the host CPU and the CXL connected device. The programming model here is still being worked out, but The goal is to be able to share large datasets between the host and the accelerator, which is very attractive for things like ML training. Defilippi said development is also underway for a host of custom AI chips and GPUs/NPUs. , this can be an attractive option.

When it comes to CXL in custom chip designs for data centers, Keysight’s Assay notes that if they want cache synchronization or access to some shared memory resources, those designs should be built with the CXL specification. Interoperability should be ensured. “A common custom chip design is the SMART NIC, where CXL has become very popular as a technology for data transmission.”

Security also matters, and Synopsys’ Agrawal sees security features at the transaction and system levels that can drive custom designs for data-sensitive applications, as many companies look to improve their designs. Working on your application level interface on CXL.

Result
There are other possibilities for customization within the broader memory ecosystem as it relates to data centers and HPC, including combining open source standards to create new products.

Blueshift Memory is the UK-based chip startup behind an alternative memory architecture called Cambridge Architecture. The company is using RISC-V and CXL to deploy the technology. CEO and CTO Peter Marosen said that using these open standards allowed the company to save $10 million in potential costs on an off-the-shelf CPU from a manufacturer, and “for us and our entire group Opened the door to the market.”

As for what’s on the horizon, Gary Ruggles, senior product marketing manager at Synopsys, said he’s starting to see the first inquiries for both the CXL 2.0 and CXL 3.0 from the automotive sector. “When you look at cars now, they’re roaming supercomputers. It shouldn’t be surprising that these people are seeing exactly the same things we’re seeing in the data center.

Leave a Comment