Key drivers of AI advancement.
In recent conversations with the owner of a video production company, the topic of creative AI tools came up. This leader had been ruminating how recent AI developments might impact the future direction of his firm. It's a subject I’ve previously encountered while experimenting with generative AI during my time at Amazon Ads. The burning question on everyone's mind, from this business owner to individual producers, is if and when AI will be capable of fully automating video production? The answer to this isn’t clear, but in this article I’ll provide some context that will help you gauge the progress of AI and interpret its evolution.
There are really three main drivers of AI improvement. The first is the availability of computing power, the second is the explosion of data and the third is increasingly sophisticated algorithms.
Driver #1: Availability of Computing Power
The latest generative AI technologies utilize something called large language models. They’re called “large” models because of the vast number of parameters they must learn during training. In general, the more parameters within a model, the better it performs. For instance, GPT-4 supposedly uses an astonishing 1.7 trillion parameters. The amount of computing power required to train a model like this is staggering, and has only recently become available. This is because over the past 50 years, our computer chips have doubled in power approximately every two years - a phenomenon known as Moore’s Law.
While this trend has slowed slightly in the past decade, I expect our hardware will continue to improve for the foreseeable future. And if you believe we’ll have faster and faster chips, then you can reasonably expect that we'll have the ability to train larger, better performing models over time.
Driver #2: Explosion of data
Where the improvements to computing power have followed a linear curve, the amount of data that we collectively produce has grown exponentially. Hal Varian, Chief Economist at Google helps put this into perspective: “Between the dawn of civilization and 2003, we only created five exabytes; now we’re creating that amount every two days. By 2020, that figure is predicted to sit at 53 zettabytes [e.g. 53 trillion gigabytes] — an increase of 50 times.”
The vast majority of this data is unstructured, and only a portion of the data produced each year is actually stored and carried forward into the following year. But even with those caveats, the growth of data has been incredible. It’s precisely these immense volumes of data, coupled with greater computing power, that have made it possible to train large language models like ChatGPT and Bard on such a wide range of topics.
Driver #3: Increasingly sophisticated algorithms
While the growth of computing power and data follows a relatively predictable pattern, the sophistication of our AI algorithms are anything but and therefore the one to watch the closest. The typical pattern that has played out in history is that a researcher makes a new AI architectural discovery that unlocks a major leap forward in terms of performance. The industry rushes to build upon this fundamental breakthrough, but at some point performance begins to plateau and an AI winter inevitably follows.
The current hype around generative AI can be traced back to the discovery of the transformer architecture, first proposed by Google Researchers in 2017. Along the way, there have been many other model architectures that drove AI progress, starting with generative adversarial networks in 2014 and recurrent neural networks starting in 2007 and going all the way back to expert systems introduced in 1965. As you can see, the major breakthroughs occur rather sporadically and contribute the greatest uncertainty to the future of AI - even for the field’s luminaries. Marvin Minskey, the founder of MIT’s AI laboratory, famously told Life Magazine back in 1970 that “from 3 to 8 years we will have a machine with the general intelligence of an average human being.”
Speech recognition as a case study
In order to better understand the development of generative AI, it may be helpful to trace the development of a different AI application, that of speech recognition (also known as speech-to-text). Speech recognition has quickly become a ubiquitous feature in mobile devices and other smart devices. Part of its prevalence now is that speech recognition has finally become “good enough.”
A crisp definition of “good enough” is important to evaluate the potential impact of AI. A good benchmark is to compare the quality of the AI’s work against the quality produced by an average human. For self-driving cars, an important consideration is safety. In Tesla’s Q4 2022 report, drivers using Autopilot recorded one crash for every 4.85 million miles driven, compared to the most recent 2021 data provided by NHTSA showing an average of one crash every 652,000 miles. In the context of speech recognition, the average human is around 95% accurate when it comes to recognizing what someone else is saying.
Although speech recognition is pervasive now, it's actually been under research and development in some fashion since Bell Laboratories first designed the “Audrey” system in 1952. Thanks in part to DARPA research funding, major improvements were made in the 1970’s and beyond. However, even as recently as 2001 the state of the art was only capable of 80% accuracy - well short of where it might be suitable for mainstream applications. It took another decade and a half before Google managed to achieve human parity of 95% accuracy in 2017.
Coming full circle back to video production, one of the most impressive startups applying AI to video production is Runway. When you look at the output of their Gen 2 model, it’s clearly not ready for prime time yet - the frame rate is choppy and there are noticeable artifacts. At the same time, their output is starting to feel like a real video. When you trace the history of Runway, the startup was founded in 2018 around building video editing tools for TikTok and YouTube creators. It wasn’t until 2022 that they made a breakthrough in generative AI. Runway partnered with Stability AI to release Stable Diffusion, one of the leading transformer models for image generation. They built Runway’s Gen 1 model for video generation in Feb 2023, followed in June with Gen 2. Even Runway’s brief history traces a similar pattern in terms of algorithm advancement. They saw years of incremental improvement, until they made the leap into transformer architectures in 2022.
This post has been a long winded way of saying I don’t know when generative AI will become sufficiently good to fully automate creative work like video production. The technology’s progress depends on factors like computing power, proliferation of data and algorithm advancements. Given the unpredictable nature of algorithm innovation, it could happen next year or it could be a decade or more into the future. Regardless of the specific timing, the astute business leader won’t wait for that potential future. Instead, they’ll begin reinventing their workflows and creative process around AI technology today.