If you’re a data center operator looking to expand capacity, you’re keenly aware that securing access to the grid is job number one on your to-do list. No power, no data center, as they say. But the extraordinary demand placed on the grid by AI and high-performance compute (HPC) has another concern rapidly ascending the list: power quality. While data centers running traditional workloads have largely solved for power quality, AI/HPC applications are raising new challenges as the nature of compute changes.
Data centers rely on consistent, uninterrupted power to ensure uptime, protect equipment, and maintain operational efficiency. Assuring high power quality is the responsibility of both the utility and the data center operator. In this blog, we’ll take a look at some of the factors that influence power quality and what data center operators can do to protect their facilities, reduce the risk of financial penalties, and be good neighbors to everyone who shares the grid in the AI era.
Power quality refers to the reliability, stability, and cleanliness of the electricity supplied to the data center. Compute-intensive, time-sensitive AI processing and inferencing are particularly vulnerable to power anomalies such as voltage fluctuations, frequency deviations, harmonics, outages, and transient (one-off) events. The consequences can be immediate and harsh.
Uninterrupted power supply (UPS) failure is the No. 1 cause of major power-related outages. Harmonics can damage a UPS and the electrical equipment connected to it.
Processor errors, memory instability, and storage system failures that interrupt access to data and corrupt results
Unreliable, unrepeatable training results, latency spikes, and timeouts that affect model and algorithm integrity
Node failures that affect large AI workloads running across multiple servers
Brownouts that reset systems or drop active sessions
Overheated power supply units or converters in high-density AI racks
System throttling that initiates thermal shutdowns to protect components
Transformer failure, which can be particularly costly in terms of downtime; the current lead time for new transformers can be two to four years — a risk even for those operating redundant systems
The grid wasn’t built for this
Power grids were built to handle typical supply and demand cycles, smoothing out peaks and troughs and adapting to irregularities. For the most part, they do this well despite the inherent complexity of turning energy into usable electricity and delivering it reliably.
But much of the world’s infrastructure was built in the 1960s and 1970s, when usage was easier to predict and manage. The incandescent lights, AC motors and analog devices typical of that period produce linear power loads that don’t distort power quality. The electrical current they draw is proportional to the voltage applied.
Modern digital environments are a different story. Servers, LED lighting, and variable-speed HVAC units, for example, produce non-linear, spikey loads that require more sophisticated power quality management. Unlike the daily peaks and seasonal characteristics of the past, spikes can occur at any time due to the variable demands of AI data centers, cryptocurrency mining, and the “electrification of everything” trend.
Additionally, the grid itself is now subject to the vagaries of renewable energy sources such as solar and wind that are far less predictable than fossil fuels or hydropower. There’s much more uncertainty in the system. Traditional utility planning frameworks were not designed with all of this in mind.
Although only 14% of data center outages are classified as serious or severe, they remain expensive. In terms of direct, opportunity, and reputational costs, 70% of all data center outages cost $100,000 or more, with 25% in the $1 million-plus range.
When it comes to the electricity pulled by AI-era data centers, think about it this way. Every time you flip a switch it disrupts the flow of energy. On. Off. On. Off.
That’s basically what a microchip is doing, only today’s advanced versions are turning currents on and off billions of times per second, and they’re drawing an extraordinary amount of power while doing so. When you consider that a single hyperscale data center may deploy millions of GPUs, CPUs, NPUs, and TPUs, you see what utilities are up against.
Chip
Stands for
What it’s used for
Power needs
CPU
Central Processing Unit
General-purpose computing that runs operating systems and applications
Medio
GPU
Graphics Processing Unit
Originally for graphics, now widely used for parallel processing in AI, gaming and simulations
High
NPU
Neural Processing Unit
Accelerates AI tasks such as image recognition and voice processing; often used in phones and edge devices
Low to medium
TPU
Tensor Processing Unit
Specialized chip by Google for the high-speed training and running of deep learning models
High
¿Por qué? Because a power grid is a shared resource. Utility engineers design and maintain it with three technical considerations top of mind: power quality, reliability, and the balance of supply and demand. They’re doing so on behalf of everyone using it, from families and small business owners to sprawling tech campuses and large manufacturing facilities. Disruptions caused by one affect all.
“Dirty” power: Is the data center the culprit?
Short answer: Yes, sometimes. Generally, voltage follows a rolling wave characterized by the smooth, periodic oscillations — a sine wave, as depicted by the green line in Figure 1. International standards for steady state loads established in the mid-1990s governing harmonic currents, voltage flicker and other factors have served data center operators well. Some have even established additional, more stringent standards for their own facilities.
Intel introduced the first general microprocessor in 1971, which consumed just 0.500W of power. Today, NVIDIA’s Blackwell B200 GPU consumes up to 1,200W.
But AI models cause massive, sudden surges in power usage, making “white space” within a data center — the room housing IT equipment such as servers, storage, and networking gear — the source of distortion. Drawing power in rapid, uneven bursts generates harmonics that distort the voltage wave (the blue and yellow lines).
It’s like continually throwing pebbles of different sizes into a small pond and seeing the ripples collide and distort as they bounce back from the shore. The high-frequency switching used in servers to regulate voltage adds even more electrical noise to the mix. And extreme weather events such as heat waves can further amplify harmonics as variable frequency drives (VFDs) adjust the frequency and voltage of electrical power supplied to cooling fans that reside in the data center’s “grey space” where power distribution, cooling systems, and generators live. To borrow a phrase, “We’ve seen the enemy, and it is us.”
If not properly filtered, all this chaos can be fed back into the grid itself, disturbing not just the data center’s power supply, but also that of every user on the electrical network. Power quality issues can damage sensitive equipment in hospitals, factories, telecom networks, and elsewhere. Transformers may fail, causing entire zones to go dark.
“Dirty” power rife with harmonics, voltage distortions, transients, imbalances, and other irregularities also increases energy loss, because power generation and transmission become less efficient. Since harmonics increase heat in electrical equipment, cascading effects significantly impact data center power usage effectiveness (PUE) as energy loss escalates, energy efficiency falls, the need for additional cooling rises, and power consumption jumps.
Figure 1. Harmonic distortion.
Real examples — and a new solution for subharmonics
While AI-centric data centers are still in the minority, McKinsey and Company projects that by 2030, about 70 percent of new data center capacity will be designed to support advanced AI workloads. As well, 74 percent of colocation providers are already investing in infrastructure upgrades to meet customers’ AI requirements, according to a survey by the Uptime Institute. The demand for AI and HPC applications is driving opportunity, but it’s also causing data center operators to hit the “pause” button as new challenges arise.
For instance, one hyperscaler told us that their data center buildout could cause power disturbances within a 200-mile radius of their location. To put that in perspective, had that data center been located in Paris, its presence would have been felt as far away as Brussels and the outskirts of London [Figure 2]. Another said they were going to buy enough generators this year to power the city of Chicago, which is home to 2.7 million people.
We’ve been working with our hyperscaler customers to develop solutions that address many of challenges arising from AI/HPC computing. One such solution is our groundbreaking Sistema de almacenamiento de energía capacitivo (CESS). This new technology supports and balances power supplies during large power transients (voltage or current surges) caused by sudden changes in electrical loads.
Figure 2. A 200-mile radius around Paris, France.
During testing, we found that while the harmonics issues from AI workloads can be mitigated through a variety of approaches, there are significant issues with subharmonics — not as a result of the power system, but as the load waveform is reflected through the power supply. Subharmonics are oscillations at frequencies that are a fraction of the fundamental (base) frequency, and load pulsing exacerbates them. While that may sound benign, subharmonics not only can degrade power quality and create issues with local generators, but also destabilize DC/DC converters, cause overheating, and lead to premature equipment failure — and power supply solutions such as active harmonic filters and harmonic-mitigating transformers and UPS systems don’t resolve them.
Flex CESS counteracts the subharmonics without amplifying power and cooling requirements or shortening the lifespan of the chips running the AI/HPC workloads [Figure 3]. Not only does this resolve power quality and reliability issues inside the data center such as those mentioned at the outset, but it also keeps the subharmonics from negatively affecting the power grid itself.
Flex also partners with Comsys, utilizing its ADF portfolio of active dynamic solutions to monitor data center power supply and compensate for electrical imperfections thousands of times per second in order to reduce voltage disturbances and stabilize the grid.
Clean up to ramp up
Operators running AI/HPC workloads must find ways to do so without destabilizing the power supply for everyone else. Utilities are updating their interconnection rules as data center loads escalate, with some even requiring the submission of validated load models. The U.S. Federal Energy Regulatory Commission is taking a close look at direct power delivery co-location arrangements.
With compute intensifying and data centers proliferating, “better safe than sorry” is a good rule of thumb. Generally speaking, the grid is on the receiving end of dirty power. If the grid infrastructure is outdated or overloaded, it can spread disturbances back to the source and to other users, which is not only a financial and operational risk but a reputational one, too.
It is in data center operators’ best interests to do their part to clean it up by:
Consulting on harmonics ahead of the system design to mitigate issues up front and create an agile strategy that easily accommodates expansion and upgrades
Considering not just harmonics, but subharmonics as well — and mitigating them with solutions such as the Flex CESS
Deploying active, “smart” harmonic filters that constantly monitor electrical current and inject countersignals when harmonics are detected to keep them from sneaking onto the grid
Using power factor correction equipment such as capacitor banks or dynamic compensation systems that reduce electrical “spillage” and make electrical systems more efficient
Installing isolation transformers that restrict noise and harmonics within the data center
Collaborating with utilities to forecast and smooth large AI loads through smart grid coordination
Adhering to utility interconnection standards such as IEEE 519 (U.S.) and EN 50160 (EMEA) that set limits on harmonic distortion levels — standards that may result in financial penalties if not met — and the IEC 61000 series that covers EMI (electromechanical interference) emission and reception
Solving system-level challenges with system-level solutions
High power quality is an unsung hero — when it’s good, things just work. The lights come on. The machines run. But from the moment power quality begins to erode, consequences start to pile up. They may come in stealth, such as harmonics that go undetected yet cause seemingly mysterious equipment failures far from the source. They may scream their presence through a brownout or blown transformer that interrupts the proceedings immediately (and sometimes irrevocably). They may show up as a well-informed, proactive utility seeking to balance the needs of all its stakeholders.
Power quality is a system-level problem requiring system-level solutions. Flex works closely with leading chip companies and data center customers to address anticipated power quality challenges proactively in alignment with product roadmaps and changing architectures. With a full suite of critical and embedded power products and direct-to-chip cooling solutions, our unique vantage point extends from grid to chip, giving our customers valuable insights that inform comprehensive solutions for complex issues.
Utilizamos cookies en este sitio para mejorar su experiencia de usuario. Puede obtener más información sobre nuestro uso de cookies y sus opciones. aquí. Al interactuar con esta página, usted nos da su consentimiento para que establezcamos cookies. Despedir