The AI Battle Really Begins


In late 2022, there was a ton of Azure capacity issues reported in the media. At the time, the lack of capacity was blamed on “supply chain” problems.

But a quarter or two later, OpenAI released GPT-4, which showcased new capabilities previously unseen in Large Language Models (LLMs), and apparently required a massive amount of computation from Azure to train.

The novelty and usefulness of GPT-4 were apparent immediately. An incredible early paper from Microsoft showcased how GPT-4 could be the beginning of Artificial General Intelligence, a multi-modal, general purpose reasoning machine with a mostly human level understanding of the world.

GPT-4 was so disruptive, in fact, that OpenAI is now on track to generate $1 billion in annual revenue from its ChatGPT product less than a year after the launch.

Other companies are now waking up.

The High Costs of Doing Machine Learning

H100 GPU Cluster

If you’re not actively following the AI space, you may have missed NVIDIA’s Q2 2024 results. They were unbelievably good, given the size and scale of the company. Revenue was up 141% from the previous quarter, and up 171% from a year ago.

It seems every large company is now scrambling to catch up in the AI game, and in the process are spending outlandish sums of money to build datacenters filled with the latest NVIDIA GPUs necessary for training.

To be clear, each of NVIDIA’s latest GPUs (the H100) cost around $34,000, but generally is used in machines designed for clusters of 8 in a single machine. These machines cost between $300k-400k each.

For reference, a single training run for a 70B parameter language model (LLama2 in this example) uses 1,720,320 GPU hours worth of compute at 400W of energy usage each.

(In this case, LLama2 used A100s, the prior generation to train. But the hours / investment are comparable for training a given model.)

This would take 10,240 GPUs, running 24/7 (at a cost of around $350 mil in GPUs alone!) to train the model in a week.

If we wanted to avoid CapEx and instead use AWS, a p4d.24xlarge costs $32.77 per hour, and comes with 8 A100 GPUs each. That’s an affordable $7 million to train one model run, ignoring the costs of data transfer, debugging, and setting up of data pipelines. (If there are any available).

Given the extremely high costs associated with developing these state of the art models, SemiAnalysis has coined the term “GPU rich” vs “GPU poor”. Companies which have invested heavily in GPU infrastructure prior to the GPT-4 explosion are considered GPU rich, and capable of building state of the art models, while everyone else is considered GPU poor and scrambling to catch up, but currently locked out of building these models.

This is a wild experience, because it’s the first time in my life it’s been impossible to develop a kind of software without access to a giant pool of capital and proprietary data.

The thing that made software interesting to me as a young person was the lack of costs and low barrier to entry. Anyone anywhere in the world could build software and contribute to the global conversation of software.

A compiler to build programs was free, and an operating system to run them was free, thanks to GNU and Linux. Everyone working in or contributing to open source could take and give freely to the overal value available in Open Source. This open ecosystem led to the amazing growth of the cloud and software in general.

Trillions of dollars in tech company valuations and returns were only possible because of the open source ecosystem.

AI Success Is a Perfect Storm for Capital Concentration and Increased Inequality

funding secured

Given the massive costs associated with building or renting a GPU cluster, the potential moat available to companies with access to the giant pools of capital necessary for GPU clusters seems unparalleled.

And indeed, this is why we’re seeing startups that are pre-product raising hundreds of millions of dollars, again because the cost of entry into the state of the art machine learning model business is literally hundreds of millions in compute costs.

For someone used to the previous, cloud based and open source software paradigm, there couldn’t be a wider difference.

Suddenly the costs of building a product aren’t concentrated in the humans necessary to build the software itself, but rather in the raw costs of materials, energy, and data necessary to participate.

Which sets artificial intelligence up for a socially dangerous feedback loop.

There will only be so many superclusters in the world capable of training these giant models, and there will be only a relative few engineers who are using and gaining experience with these superclusters.

The talent pool will concentrate and shrink, and the pressure to deliver will grow on the people running and training these models.

As models like GPT-4 show the potential to automate and destroy the vast majority of the very best paying of jobs for workers available, (that is, knowledge work in general), we’re looking at a future where existing capital structures have the potential to become permanently self-perpetrating, largely off training data off the back of the public’s work (which comprises the majority of training data used on the GPUs).

As AI’s capabilities grow, a substanial amount of previously well employed knowledge workers may actually be permanently captured by these models, trained on the data taken from them, without compensating them for their efforts.

The Open Source Machine Learning Hero is… Meta?!


Given the costs associated with training these models, the moat for OpenAI’s GPT-4 appeared to be enormous and durable. But then Meta (prev Facebook) released Llama.

Llama was originally supposed to be a large language model released just to researchers, on a case by case basis. Inevitably though, the model leaked, and showed up on Torrent websites. Soon enough, a model that cost millions to create was in the hands of everyone who wanted to try it out, and knew how to use a torrent link.

This led to a rapid explosion in the public progress around large language models. Researchers fine tuned the leaked model on GPT-4 output and made it public. Shortly afterwards a reasonably useful model was made to run on consumer level hardware, thanks to projects like llama.cpp.

With that rapid public progress, Meta has continued to release powerful models, focused on computer vision, language, and more.

Better still, they’ve released some of their models in a way that allows for commercial use. And most recently, they released a completely free large language model, Llama 2.

The moat inherent in machine learning’s costs is becoming less apparent. Especially when you consider a trained model could always leak.

The Moat for AI is Now Secrecy and Paranoia

Dario Amodei

In an interview the CEO of Anthropic, Dario Amodei spoke a bit about his approach to cyber security with training his models.

He mentioned that some of the edge for his company relies on what he calls “compute multipliers”. The core idea is that they have discovered training optimizations that are the equivalent of having more compute.

At his organization, he’s implemented a compartamentalization strategy, similar to a spy agency, where no one person could leak the overall approaches necessary to train the Antropic Claude models.

What’s interesting is that in the same interview, he tends to beat around the bush of how much the business model of AI relies on the model never being leaked or hacked. There is an acknowledgement that governments will be able to hack and steal any model they like, given the model in question shows high enough value. But there isn’t an equivalent belief that someone will leak the same.

Similarly, for now there is a relatively minor moat in larger language models. It may require multiple, $34,000 GPUs to do execution after training. But again, the open source work of projects like llama.cpp, huggingface, and others is chipping away at the moat.

The Vibes of Training AI Models are Kinda Wack

the future is great

So let’s recap the dynamics of privately funded AI:

All of this is happening behind closed doors, with giant pools of capital building super machines, with the usage of the humanity’s collective knowledge and knowledge labor as input.

To capital, AI seems to be the ultimate sort of privatization of public knowledge, along with the ability to steer the source of what will certainly become “the reference” for an “unbiased” source of information for a substantial portion of humanity.

Control of the knowledge base from which humanity has its reference is of course, slightly interesting to intelligence agencies and governments.

Given the vibes and given the costs, why even try to participate? Why not leave the AI game to the “grown ups”, and let them tell you the future?

Open Source And the Battle Against Techno-Disillusionment

a real battle

I’ve been lucky enough over the past decade’s worth of technological improvements to have had the tiniest seat at the table. I’ve been able to steer at least a part of the conversation for what the future of technology and software will look like.

My peers outside of tech, however, have mostly not had a voice. For an increasingly online and software driven world, most people living within it don’t have a say for how it should work.

Instead, it’s mostly been venture funded organizations seeking massive growth who have built the digital worlds everyone inhabits.

This mostly worked out fine for the past 10 years because VC was mostly in the business of giving money away. Apps like Uber were famously money losing, scaling in an effort to build out market dominance. In the meantime, humanity got a sweet discount on rides.

But this new generation of technological improvments threatens to turn the disillusionment dials up to 11. Imagine only a few thousand people steering all of the most important software used by the rest of the world. Imagine the sorts of power games and pressure with such a limited number of people working in the space. Imagine the immense pressure to return capital on investment for billions.

And add the lack of accountability inherent in a machine learning model that is incomprehensible to humans.

The stakes are generally too high for all of us to have artificial intelligence not be developed in the open.

The GPU Poors are our best hope.