Benchmarks for AI-driven productivity gains.
2023 has seen an explosion of interest in generative AI. Venture capitalists are investing in it, big tech companies are hyping it and companies are rushing to pilot it. However, there have been precious few case studies that provide hard data on the impact of AI to operational efficiency. So how are business leaders to know what type of productivity gains to expect when deploying AI?
Fortunately, academia has leaned into this question and published a number of working papers this year studying the impact of AI on different types of work tasks. Below are the key learnings from a few of the most interesting papers, as well as some of the major implications for business leaders considering deploying AI within their companies.
Writing tasks
MIT published research on the impact of AI on business writing. The study consisted of 444 college-educated professionals from a variety of backgrounds (marketers, grant writers, data analysts, consultants, HR, managers). They were asked to write two pieces of content for work, like press releases, short reports, analysis plans and delicate emails. The first piece was written by themselves. The treatment group used ChatGPT to help write the second piece. The time to complete the task was 37% less for the treatment group, compared to the control group. Not only did AI speed up the work, it also improved the quality of the work. Evaluators were asked to score the quality of the writing on a scale of 1-7. The average writing grade improved by 15% when using AI, from an initial score of 4 to a final score of 4.6.
Strategy tasks
In September, Harvard Business School shared the results of their collaboration with BCG. They gave 758 consultants each a set of 18 realistic consulting tasks to gauge the impact of AI on productivity. Consultants using AI completed 12.2% more tasks on average and completed tasks 25% more quickly. Like the aforementioned writing study, AI usage also boosted the quality of their output. Consultants using AI produced 40% higher quality work compared to those consultants who didn’t use AI.
Coding tasks
Microsoft and Github partnered with MIT to study the impact of Copilot, an AI programming tool, on developer productivity. They recruited 95 professional programmers and asked them to write an HTTP server in Javascript. The 45 developers in the treatment group used Github Copilot, and the 50 developers in the control group did not. The average completion time from the AI-using group was 71.2 minutes, compared to 160.9 minutes for the group without AI, which means that AI drove a 55.8% reduction in task completion time.
Customer support tasks
Finally, the National Bureau of Economic Research conducted a study of the impact of AI by examining 3 million support conversations held by 5,179 customer support agents, of which 1.2 million conversations were held after AI was introduced. The study found that customer support agents using AI resolved 14% more issues per hour. These gains came from a reduction in the time it takes for an agent to handle an individual chat, an increase in the number of chats an agent can handle per hour and an increase in the share of chats that are successfully resolved.
As you can see, the studies report productivity improvements range anywhere from 14% to 55% depending on the task and specific metric used. However, we recommend you think of these figures as an upper bound on the expected impact when deploying AI within your company. First of all, these studies were conducted under strictly controlled environments. The only time measured was that spent doing the test task, like writing or coding. In the real world, your employees will have to attend meetings, respond to emails and engage in a whole host of other typical business activities that aren’t captured in these studies.
Second, many mid-market business leaders lack the resources to accurately measure internal productivity, and therefore will have to rely in part on anecdotal evidence. But the Github Copilot study shows the challenges with this approach, as the participating engineers underestimated the impact of AI, self-reporting a 35% increase in productivity compared to the 55.8% increase actually measured. Now, beyond a starting point to evaluate AI and the potential productivity gains it can unlock, what else should a mid-market CEO take away from these studies?
Not every task is suitable for AI
These studies were built around tasks that the latest generation of AIs are particularly well-suited for - tasks such as writing, coding and conversing. It's important to remember that there are many use cases where generative AI commonly hallucinates or is otherwise poorly suited. For example, ChatGPT can be terrible at solving simple arithmetic problems. And many developers I’ve spoken to report that while Copilot is helpful for common and simple coding tasks, it struggles with more niche problems.
In the BCG research above, some of the 18 tasks used in the study were chosen because they were deemed beyond the current capabilities of AI. And when consultants used AI to support those tasks, they were 19% less likely to produce correct solutions compared to consultants without AI. Part of the risk in deploying AI is that even when it produces something incorrect, the output will often look plausible. Business leaders should carefully consider the tasks they wish to deploy AI against, as applying it in the wrong scenarios will lead to much worse outcomes than if you didn’t use it at all.
AI provides greater benefits for less-skilled workers
While the research confirms that AI can drive productivity improvement, all of the above studies also indicate that the gains were distributed unevenly across the test population. Specifically, less skilled workers benefit more from using AI, compared to highly skilled workers. In the business writing study, the writers who scored the lowest on the initial task saw the greatest degree of improvement when using AI, compared to those who initially scored higher.
In the BCG study, consultants who performed below average saw a 43% quality improvement when using AI. But consultants who were above average only saw a 17% improvement in quality.
The Github study specifically notes that “developers with less programming experience benefited the most” and even amongst the customer support research there was a more pronounced improvement amongst less-skilled and less-experienced workers. Newer agents using AI saw a 35% increase in the number of issues resolved per hour - much higher than the average improvement of 14%.
This suggests that companies that are more heavily reliant on less skilled or experienced workers would benefit more from deploying AI. Conversely, companies with a highly skilled workforce should expect to see lower productivity improvements from leveraging AI with their current talent. And any company expecting to hire rapidly, whether due to high turnover or high growth, should consider using AI to quickly bring new employees up to speed and close any skill gaps.