Accelerating software engineering with AI

Sep 25

Written By Dan Hou

In the last year, AI coding assistants have become increasingly prevalent in software development workflows, prompting many organizations to assess their utility. We’ve heard multiple CTO’s ask “How should you use AI coding assistants with your engineering team?” There unfortunately isn’t a simple, one-size-fits-all answer. After attending the T3 Software Engineering Leadership Summit in NYC, where 50 engineering leaders from companies like The New York Times, Betterment, MoMA and Block gathered, it became clear that while the use of tools like GitHub Copilot, Cursor or Zed, is pervasive, where and how its applied varies significantly across engineering teams.

What the Research Says

Before diving into our takeaways from T3, it’s essential to first review what academic research says about AI’s impact on coding productivity. Earlier this year, we published a meta analysis of the research that included a study from GitHub citing a 55% productivity improvement. A more recent study by Princeton, MIT, UPenn, and Microsoft revealed a more modest 26% increase in pull requests on average when coding assistants were used.

*Figure 1. Adoption and usage over time, by developer tenure and level*

For junior developers, the productivity boost ranged from 27% to 39%, while senior developers saw a more modest increase of 8% to 13%. This data reinforces prior research that indicates AI assistants provide disproportionate benefits to less-experienced developers compared to senior engineers.

Insights from the T3 Summit

At T3, engineering leaders were unanimous in their usage of AI coding assistants. But what was striking was how drastically the extent and manner of their usage differed. Some teams chose to use AI sparingly, whereas other teams were writing the majority of their code with assistants. One team insisted their developers write their own code, but chose to use AI to write tests. Another team did the complete opposite, and focused their engineering time on writing test code. They then used those tests to validate the code that the AI had written.

Which one of these approaches is correct for your team will depend heavily on your unique context. We distilled our thinking down into a simple framework that can guide your decision making.

Key Considerations for Engineering Leaders

Personal and Cultural Risk Tolerance
It quickly became apparent during the discussions that the personal risk tolerance of the engineering leader colored their entire team’s approach to using AI. For example, engineering leaders with a “move fast and break things” attitude were far more willing to trust AI-generated code. Others who mentioned being skeptical to new technology developments in general approached AI in a similarly cautious manner, only using its suggestions in very specific scenarios. When using AI, they insisted on reviewing every line of code in-depth. Mark Zuckerberg captured these contrasting approaches well in a recent interview about Meta’s engineering culture:
“There’s a certain personality that goes with taking your stuff and putting it out there before it’s fully polished. I’m not saying that our strategy or approach on this is the only one that works. I think in a lot of ways we’re like the opposite of Apple. Clearly, their stuff has worked well too. They take this approach that’s like, ‘We’re going to take a long time, we’re going to polish it, and we’re going to put it out.’ And maybe for the stuff that they’re doing that works, maybe that just fits with their culture.”
Neither one is necessarily superior to the other. But you should be honest with yourself about the kind of leader you are and the kind of organization you lead when considering where and how you should deploy AI coding assistants.
Maturity of the Product and Business
Another critical determinant of AI adoption that we saw was the stage of the product life cycle that the engineering team was working toward. Engineering teams that were tasked with improving mature applications with thousands or millions of users and paying customers understandably were far more conservative in their willingness to use code written by an AI. On the other hand, teams that were building for early-stage, pre-scale startups or innovation teams building proof-of-concept initiatives were far more willing to embrace coding assistants.
Surprisingly, we did not observe a strong correlation between industry vertical and tolerance for AI assistance. Engineering teams operating within highly regulated industries like healthcare, where the cost of being wrong can literally mean someone’s life, seemed to be using AI assistants as much as companies within arguably lower risk verticals like media and entertainment.
Representation within Public Repos
Most engineering leaders realize that a Large Language Model (LLM) is only as good as the data set that it’s trained on. The foundation models powering these coding assistants are effectively trained on whatever code is available on the public Internet. Some languages, like Python or Java, are extremely prevalent on platforms like GitHub or Stack Overflow. Others, like Rust or Lisp, are far less common. Since the model has many examples in its training data of the former, coding assistants tend to handle those needs quite well. For the latter, the tools may not perform as reliably.
You’ll need to experiment with the specific coding languages, frameworks and problem space that your team uses, but part of your usage of AI coding assistants will likely depend on how well-represented they are in the public domain.
Team Experience and Composition
As mentioned earlier, multiple research efforts have concluded that junior developers derive more benefits from using AI assistants than their senior counterparts. The anecdotal evidence we’ve gathered corroborates this finding. This is due to the fact that LLMs are effectively a compression of the Internet, and are most likely to provide code that’s seen frequently within the training data. The suggested code is therefore by definition an average or mediocre solution. You would expect your senior engineers to perform above average. They hold a higher standard for code quality and are often more set in their ways. As a result, they tend to scrutinize AI-generated code more thoroughly and will overwrite it more frequently. AI can assist in writing code, but senior engineers emphasize that understanding the code’s functionality and correctness remains crucial, particularly during code reviews.
You’ll almost certainly get more value out of AI coding assistants if your engineering team skews on the less experienced side. It’s worth noting that “experienced” in this case is relative to the specific engineering task at hand. For example, your team of experienced full stack web engineers may be complete novices when it comes to building native mobile applications. In this hypothetical scenario, you would expect a tool like Cursor to be less useful while building a web-based tool, yet invaluable for creating an iOS app.

Other best practices

There were a couple of other topics that came up at T3 that should be top of mind. After you’ve decided how your team will use AI coding assistants, use these guidelines as you update your processes.

Make time to understand the code
Regardless of whether AI wrote your team’s code, it’s critical that these tools don’t excuse your engineers from deeply understanding their code. Developers should still be able to explain what their code is doing and why it’s the correct solution during code reviews. This level of comprehension ensures that AI-generated code is not just functional but also aligned with best practices, performance considerations, and the overall architecture of the project. Encouraging this critical thinking during code reviews will naturally lead to more thoughtful prompting strategies, allowing engineers to leverage AI more effectively as a tool rather than a crutch.

This isn’t a new issue per se. Long before generative AI, weaker developers would sometimes copy and paste snippets from Stack Overflow without fully understanding how it worked or how it fit into their codebase. The rise of AI coding assistants simply exacerbates this poor behavior. By prioritizing the understanding of code over mere output, you’ll ensure your teams use AI properly as the support system that it is.

Consider multiple measures of productivity
While productivity gains were noted by several teams at T3, it’s essential to define what “productivity” means in this context. Some teams focused on sprint velocity, while the aforementioned academic study used pull requests as the KPI. Neither of these capture the full story in a satisfactory manner. While AI tools may increase the quantity of code written, engineering leaders also care deeply about nuances that aren’t reflected such as code quality, performance and long-term maintainability.

Unfortunately, there’s no silver bullet here. At the recent Developer Productivity Engineering Summit, thought leaders from Google indicated there is no perfect model for measuring developer productivity. Instead, they have resorted to using multiple measurements and data points to draw their conclusions.

AI coding assistants are becoming increasingly embedded in engineering workflows, but their optimal use depends heavily on team dynamics, project maturity, and the specific coding languages involved. Leaders must assess their own risk tolerance, team composition, and the nature of their projects to determine how best to leverage these tools. While AI assistants can enhance productivity, especially for less experienced developers, fostering a deep understanding of the code remains crucial to ensure both functionality and long-term maintainability. By thoughtfully integrating AI into development processes, teams can maximize its benefits while mitigating potential risks.

Dan Hou

Accelerating software engineering with AI

The Commoditization of AI

Mitigating the Risks of AI