Artificial Intelligence (AI) systems operate in two distinct phases — training and inference. Though they rely on similar hardware and data, they serve very different purposes and require contrasting datacentre designs.
Training: Teaching the Model
AI training is the process of teaching a model to recognise patterns in data. It’s like showing a child thousands of pictures of cats and dogs until they learn the difference. In practice, training involves feeding billions of text, image, or video samples through neural networks and adjusting millions of parameters until the system can make accurate predictions.
This process is enormously compute-intensive. Training a large model such as GPT-4, Claude or Gemini can involve tens of thousands of GPUs running continuously for weeks. These clusters need:
Because training doesn’t demand low latency, these datacentres are usually located where power is plentiful and inexpensive. To date, the majority of the investment in these large training campuses has been in North America. However, restrictions on power availability in North America is forcing Big Tech to look again at “ready to deliver” campuses in EMEA and APJ to meet the sharp increase in demand for training capacity over the next 3 years.
Inference: Using What’s Been Learned
Once trained, the model can perform inference — using its knowledge to make predictions or generate responses. Each inference task is far lighter than training, but it occurs billions of times a day. While a new model might be trained once every few months, inference happens continuously for millions of users.
In simple terms:
The rise of AI is reshaping global datacentre investment. Yet training and inference generate different types of demand.
Training Datacentres: The AI Superclusters
Training facilities are vast, centralised, and power-hungry. They’re built to handle extremely parallel computation with top-tier GPUs and networking. These sites are measured in hundreds of megawatts and cost billions of dollars to construct. Their focus is efficiency per model, not proximity to users.
Inference Datacentres: Distributed and Scalable
Inference datacentres, on the other hand, are smaller but far more numerous. They must respond to users in milliseconds, which means being close to end-users data in regional cloud clusters or on premise.
A typical inference rack may draw 12–50 kW, but with vastly more deployments needed to handle global usage, the aggregate demand becomes immense.
In the short term (2025–2028), training will dominate capital expenditure as Stargate, Microsoft, Google, Amazon and NVIDIA build “AI superclusters.”
But by the late 2020s, inference will become the main growth engine — in total compute hours, number of sites, and aggregate energy use.
While training remains concentrated in a few huge sites, inference occurs everywhere. Every chatbot query, every AI-generated email or translation, every automated driving decision is an inference event.
Even though each inference uses less compute, the sheer volume of requests will dwarf training over time. Inference will eventually consume more total power than training, despite smaller rack sizes. Inference will run in real time close to where user applications and data reside. With 70% of enterprise applications forecast to be in the Cloud by 2028, the majority of Inference processing will co-locate with high availability cloud clusters.
Others will take the form of smaller regional or edge datacentres — sometimes just a few megawatts each — built close to 5G towers, enterprise campuses, or local ISPs to ensure ultra-low latency.
Training vs Inference: A Quick Comparison
|
Feature |
Training Datacentre |
Inference Datacentre |
|
Purpose |
Create and refine AI models |
Use trained models for predictions/responses |
|
Scale |
Huge (100–500 MW sites) |
Smaller, many (1–50 MW sites) |
|
Location |
Remote, power-abundant regions |
Near users |
|
Latency Requirement |
Low priority |
Critical (<50 ms) |
|
Power Density |
80–120 kW/rack |
12–50 kW/rack |
|
Growth |
Strong up to 2028 |
Rapid growth expected beyond 2030 |
|
Long-term Share (post-2030) |
Fewer but massive sites |
Many sites; total demand larger |
Over the next three years, the world will continue pouring billions into massive training clusters — the supercomputers that create the next generation of foundation models. But as those models reach maturity and are deployed into everyday life — from customer service to healthcare, finance, and transport — inference will explode.
By the early 2030s: