Rethinking water use in data center cooling systems for the AI era

AI workloads are fundamentally changing data center requirements. What’s different about cooling in the AI era?

AI changes everything about thermal design. We’re moving from relatively predictable, CPU-based workloads to GPU-driven environments with extreme power densities and highly dynamic load profiles. That means heat is more concentrated, more variable, and much harder to manage with traditional, air-based systems. Cooling is no longer just about maintaining ambient temperature; it’s about removing heat precisely at the source — at the chip level — in real time. That’s where direct-to-chip liquid cooling becomes essential, not optional.

Water usage is becoming a major concern. How should operators think about WUE in this new landscape?

WUE, or water usage effectiveness, has become a critical metric because data centers are now competing for finite freshwater resources at scale. Some facilities consume millions of gallons per day, which is simply not sustainable long term. The challenge is that many traditional cooling approaches force a tradeoff: you can optimize for energy or water efficiency, but not both. Evaporative systems, for example, are energy-efficient but water-intensive. What we’re focused on is breaking that tradeoff. The goal is to enable high-performance cooling with minimal or zero water consumption at the facility level.

How does liquid cooling change the equation for both performance and sustainability?

液体冷却 fundamentally improves heat transfer efficiency. Water or dielectric fluids can absorb and carry away heat far more effectively than air. With direct-to-chip approaches, you’re removing heat exactly where it’s generated instead of trying to cool the whole data hall.

This has two big advantages:

The first is higher compute density without thermal constraints.
The second is less reliance on energy- and water-intensive cooling infrastructure.

In many cases, you can eliminate or significantly reduce cooling towers altogether, which has a direct, positive effect on WUE.

There’s a lot of discussion around different cooling approaches. What differentiates the most advanced systems?

Not all liquid cooling is created equal. The real innovation is how efficiently you can move heat away from the chip and reject it from the system. The difference is whether cooling is treated as a system-level optimization or simply an add-on to existing infrastructure.

More advanced approaches focus on four key things:

Minimizing or eliminating dependence on evaporative processes
Precisely cooling hot spots rather than the entire room
Operating at higher fluid temperatures to reduce energy overhead
Enabling “warm water” or chiller-less cooling architectures

Some solutions still rely heavily on facility-level cooling systems. Is that sustainable at AI scale?

That model becomes increasingly difficult to scale. When cooling depends heavily on centralized infrastructure, such as large cooling plants or water-intensive systems, you introduce constraints around energy, water availability, and physical footprint. What we’re seeing is a shift toward more distributed, modular cooling architectures that operate closer to the compute. This reduces losses, improves responsiveness, and gives operators more flexibility as workloads evolve. At AI scale, efficiency has to be engineered into the system, not bolted on afterward.

The majority of data centers have been in operation for years and rely on air-cooled systems. As they transform their white space for GPU-centric workloads, can they incorporate liquid cooling to handle the higher thermal loads?

Yes, but it requires thoughtful integration. Most existing data centers weren’t designed for the power densities and thermal loads associated with AI, so retrofitting isn’t as simple as swapping one cooling method for another. That said, many operators are successfully adopting hybrid approaches, introducing liquid cooling into targeted zones within the existing data hall. This allows them to support GPU clusters without overhauling the entire facility.

The key is modularity and scalability. Modular liquid cooling systems can be deployed incrementally, rack by rack or row by row, while coexisting with legacy air-cooled infrastructure. This minimizes disruption while enabling operators to gradually increase capacity and efficiency. From a WUE perspective, retrofits also create an opportunity to reduce reliance on water-intensive cooling methods.

How do direct-to-chip liquid cooling and immersion cooling compare in terms of WUE and PUE, and what’s the realistic adoption timeline for each?

Both approaches significantly outperform air cooling, but they differ in how they impact WUE, PUE, and deployment timelines. Direct-to-chip liquid cooling improves PUE by removing heat efficiently at the source and can reduce dependence on evaporative cooling, helping improve WUE. It’s also easier to retrofit, making it the fastest path to scale for AI workloads. Immersion cooling can push efficiency even further, particularly for extreme densities, with strong potential benefits for both WUE and PUE. However, it requires fundamental changes to hardware, operations, and facility design. As a result, it’s still largely in the pilot or early adoption phases, while direct-to-chip is becoming the near-term standard.

How should data center operators balance WUE with other metrics like PUE and CUE?

This is where a holistic efficiency framework becomes essential. Optimizing one metric in isolation can create unintended consequences elsewhere. For example, you can lower PUE while increasing water consumption. You can improve compute performance while increasing cooling complexity. And you can reduce energy use but rely on carbon-intensive sources. The leaders in this space are the ones managing these interdependencies deliberately, and cooling sits right at the center of that balance. It directly impacts energy, water, and carbon outcomes.

What role does innovation play in making AI infrastructure more efficient?

Innovation is the only way forward. AI isn’t slowing down, so infrastructure has to evolve to support it responsibly. Tighter collaboration across the ecosystem between chipmakers, system designers, and infrastructure providers all need to be aligned. That means:

Enabling new architectures that transform how power and thermal systems interact
Designing cooling systems that scale with compute density
Reducing resource consumption at the source, not offsetting it elsewhere

Looking ahead, what will define best-in-class cooling strategies for AI data centers?

Three things:

Precision — Cooling exactly where it’s needed, at the chip level
效率 — Minimizing energy and water use simultaneously
Adaptability — Supporting rapidly evolving AI hardware and workloads

The future isn’t about incremental improvements to legacy systems. There isn’t time for that. It’s about fundamentally rethinking thermal management as a core enabler of AI infrastructure.

重新思考人工智能时代数据中心冷却系统的用水方式

A conversation with Flex’s Rick Payne, VP of design and engineering, on innovation, water efficiency, and next-gen thermal design