The Commoditization of AI

There's been a lot of recent buzz about DeepSeek and how it's offering AI models at a fraction of competitors' prices. But if you take a step back, it’s simply the latest point along one of the most defining trends in AI over the past year: the rapid commoditization of Large Language Models (LLMs). As these once-rarified technologies have become increasingly accessible and interchangeable, the industry's focus has naturally shifted from raw model capabilities to their practical implementation and integration. Let’s take a closer look at the underlying forces at work.

Growing Competition at the Frontier

At the close of 2023, GPT-4 stood virtually unchallenged in terms of performance. However, 2024 witnessed a remarkable narrowing of this performance gap, particularly from open-source alternatives. Meta's Llama family of models led this charge, alongside impressive offerings from NVIDIA, Mistral, and DeepSeek. They demonstrated that high-performance language models need not be the proprietary domain of a select few providers.

Figure 1: Proprietary vs Open Source AI performance on PhD-level science questions, from Epoch AI.

This proliferation of frontier-competitive, but open-source models has led to a vibrant ecosystem that offers more choice and flexibility than ever before.

Plummeting Prices

With the proliferation of competition has come a precipitous drop in LLM pricing. To put this in perspective: in December 2023, OpenAI charged $30/mTok (per million tokens) for GPT-4 input and $1/mTok for GPT-3.5 Turbo. Today, their most advanced model, o1, is comparable in price, but their GPT-4o is available at $2.50/mTok. This represents a stunning 92% price reduction in a single year. Even more remarkably, GPT-4o mini costs just $0.15/mTok, making it not only seven times cheaper than last year's GPT-3.5 Turbo but significantly more capable.

The competition has been just as aggressive. Anthropic's Claude 3 Haiku, released in March, costs $0.25/mTok. Google's Gemini 1.5 Flash comes in at $0.075/mTok, while their Gemini 1.5 Flash 8B is priced at an astonishing $0.0375/mTok - 27 times cheaper than GPT-3.5 Turbo was last year. DeepSeek has now entered this market with a splash, and is sure to contribute to the downward pricing pressure. This race to the bottom is a classic hallmark of commoditization.

Rising Importance of Inference

In parallel, the industry has shifted focus from pre-training to inference optimization. Prompting techniques like chain-of-thought or agentic architectures enable models to break down complex tasks into smaller reasoning steps, and have become central to improving model performance. Since these approaches require multiple inferences, they almost demand improvements to inference cost and speed.

Specialized hardware companies like Groq and Cerebras have been furiously building solutions to this, with Cerebras achieving 2,130 tokens per second on their Llama 3.3 70B implementation—showing how the value chain is moving from raw model capabilities to optimizing how models execute multi-step reasoning processes.

Figure 2: AI speed, by inference hardware provider, from Artificial Analysis.

Beyond Raw Power

As LLMs become increasingly commoditized, providers are finding new ways to stand out. Google's Gemini boasts a massive 2-million-token context length, while OpenAI differentiates through web browsing capabilities. Anthropic's Claude excels in writing tasks and developed UI innovations like Projects. Smaller, specialized models like Microsoft’s Phi-4 are carving out their own niches.

The real value increasingly lies in integration and application. We're seeing this in practical implementations like Claude within Cursor, and in the development of sophisticated user interfaces that make AI more accessible and useful. The actual value created from LLMs will be driven not by model size or theoretical capabilities, but by how effectively it can be applied to real-world workflows that seamlessly integrate into your business processes.

For businesses and developers, this commoditization presents an unprecedented opportunity. As costs continue to decline, the barrier to entry will get that much lower and focus can shift toward application-level innovation. The real revolution in AI won’t just be about building bigger models - it'll be about making them an integral, invisible part of how work gets done.

Next
Next

Accelerating software engineering with AI