The Illusion of Unlimited Compute
It wasn’t all that long ago that organizations purchased and maintained their own computers to run websites, mail servers, databases, and the myriads of other applications that make up our techno-landscape. This was a hassle that distracted the organization from its core purpose.
Along came cloud computing as a solution. You could rent just the computing resources you needed, when you needed them.
These computers of course have to exist somewhere in the physical world, which is where datacenters come into play. Datacenters housed computers before the cloud paradigm, but they often housed many different customers. Cloud computing birthed the hyperscaler: large datacenters wholly dedicated to supporting cloud services for a single platform.
Most software can be run very efficiently in a hyperscale cloud. This meant good profits for cloud providers and as much computing capacity whenever customers wanted it.
It was easy to forget that there are physical limits for computer resources.
The emergence of generative AI and large language models don’t conform as nicely to this model of unconstrained capacity. If you want to understand more about what’s different running genAI and AI datacenters check out my latest episode of Talk To Me Petey D.