METR Study: AI Slows Veteran Developers by 19%

A surprising new study has revealed that seasoned open-source developers may actually perform slower when assisted by generative AI. The research, conducted by Model Evaluation & Threat Research (METR), challenges widespread assumptions that AI tools inherently boost productivity for experienced coders.

Expectations versus outcomes

The study involved 16 veteran developers with years of experience contributing to mature code repositories. They predicted that AI would speed up their work by 24%, and afterward believed it had made them 20% faster. However, the actual data showed a different story — task completion times increased by 19% when using AI tools like Claude 3.5, Claude 3.7 Sonnet, and Cursor Pro.

This mismatch between perceived and actual performance was consistent even among developers familiar with these tools. The trial spanned 246 real-world tasks and 143 hours of screen recordings.

Why developers slowed down

The slowdown, according to researchers, stemmed from several factors: unreliable outputs from AI, difficulty applying generated code to complex repositories, and the time spent verifying or reworking AI-generated suggestions. Developers were also found to spend more time waiting for responses from AI than actively writing code themselves.

While most participants had previous experience with large language models (LLMs), only 44% had used Cursor Pro before the trial, and even those users did not show a performance edge.

AI isn’t a universal boost

The findings offer a cautionary perspective on the widespread belief that advanced AI tools automatically improve coding speed and productivity. While AI may still benefit novice developers or be helpful in greenfield projects, METR warns that experienced engineers working in intricate codebases must critically evaluate the value of AI assistance.

The report also points to a broader challenge: overreliance on anecdotal success stories or benchmark testing may mislead organisations into overestimating AI’s immediate value.

By studying real-world developer behaviour in depth, METR underscores the need for more realistic assessments of AI tools before adopting them as default workflow accelerators.

Latest articles

Related articles