尊敬的用户您好,这是来自FT中文网的温馨提示:如您对更多FT中文网的内容感兴趣,请在苹果应用商店或谷歌应用市场搜索“FT中文网”,下载FT中文网的官方应用。
{"text":[[{"start":null,"text":"
This article is an on-site version of our The AI Shift newsletter. Premium subscribers can sign up here to get the newsletter delivered every Thursday. Standard subscribers can upgrade to Premium here, or explore all FT newsletters
"}],[{"start":8.05,"text":"Welcome back to The AI Shift, our weekly exploration of the latest evidence of how AI is changing jobs and the world of work. This week we’re looking at new research into whether AI is making knowledge workers more productive, and taking a step back to consider how one should even go about answering – or indeed asking – that question."}],[{"start":30.400000000000002,"text":"John writes"}],[{"start":31.85,"text":"We’re now three and a half years into the generative AI era and a year into the agentic AI era, and while there is increasing consensus around AI’s capabilities and utility in general terms, there is remarkably little hard data on how much of a productivity boost it is providing. One of the earliest attempts at quantifying this at the level of individual workers was carried out by the AI research non-profit METR, which found the striking result that while software engineers felt AI was helping them do their work 20 per cent faster, when precisely measured it was actually making them 20 per cent slower."}],[{"start":68.4,"text":"This week METR is back with a new survey-based study which tests whether different questions can elicit more useful results. This time it surveyed 350 technical knowledge workers including software engineers, researchers and managers, asking them to assess how much more value they are producing in their jobs now that they are using AI, as opposed to simply how much more quickly they are able to accomplish tasks. "}],[{"start":92.60000000000001,"text":"We already know the main problem with current measures of personal productivity gains — they’re self-reported, and that generally means significantly overestimated, as shown in METR’s earlier experiment. But the new report highlights a separate issue: asking how much more quickly you can complete your work using AI today vs pre-AI leads to overestimates of the value gained, because you’re often now using AI to do things that were not really important for your work — say, using AI to quickly build an interactive dashboard to go with your report, or carry out a more complex analysis. Yes, it would have taken you a lot of time to do these things manually before AI, but they are really bonus extras — those large hypothetical time savings only result in a small increase in the value of your work today."}],[{"start":143.10000000000002,"text":"To address this, METR instead asked three questions that attempt to get at the additional value AI is adding to workers’ outputs:"}],[{"start":151.3,"text":"“If your team had to replace you with people just like you (same skills, same knowledge) except that they did not have access to Al, how many copies of you would it need to hire?”"}],[{"start":161.10000000000002,"text":"“How long would it have taken you, in months, to deliver equally valuable work to that which you delivered last month if you had not had access to Al?”"}],[{"start":171.40000000000003,"text":"“What fraction of the value you currently deliver could you produce if AI tools became unavailable?”"}],[{"start":177.75000000000003,"text":"As expected, when prompted to think about value rather than pure speed, people’s estimates of the gains from AI came in lower at around 2x instead of 3x, with the third question giving the smallest estimate of 1.6x — an average 60 per cent increase in value from AI."}],[{"start":196.35000000000002,"text":"As the researchers acknowledge, these should really be seen as upper bounds — when they looked closer at some of the most striking self-reported boosts, their assessment of the work produced was that it was highly unlikely to really be as valuable as the respondent(s) said. In another instance, they confirmed that much more work was being done with AI, but whether this constituted much more value was less clear from an objective standpoint. I would add that a second reason these estimates should be seen as upper bounds is that these types of coding-heavy jobs are more amenable to automation or augmentation with AI than most other work, even in the knowledge economy — indeed METR found that the less time a participant spent coding, the smaller the AI boost to their work."}],[{"start":239.90000000000003,"text":"Taking the numbers themselves with a large pinch of salt, I think the approach here is useful and raises some interesting questions. For example, who determines the ‘value’ of an individual piece of coding or broader knowledge work? And is the increased value from AI augmentation really always needed or demanded? As they say, what we really need now is surveys from managers (and I would argue customers, where applicable)."}],[{"start":265.90000000000003,"text":"It’s also useful to think about how this approach would generalise to other jobs or domains. I found it interesting to ask the questions of my own work:"}],[{"start":275.45000000000005,"text":"“How long would it have taken you, in months, to deliver equally valuable work to that which you delivered last month if you had not had access to Al?”"}],[{"start":null,"text":"
Maybe an extra week? Which is a lot! But it’s hard to say. Three days of that I would say were critical to producing the quality of output that I did, but the other two days were arguably adding extra bits that perhaps pre-AI me would not have felt necessary, and readers may not have missed. Similarly, I’ve done pieces of work during the AI era where I didn’t use any AI but I suspect the value perceived by my editors and readers is no less than in the instances where I have used it. Columns are a funny business, and even as a data columnist I’m pretty confident the correlation between column ‘value’ and amount/complexity of data analysis is modest at best. There are so many variables: relevance to readers, timeliness, quality and originality of argument, quality of charts. AI can sometimes help with some of these things, others much less so; and even where it can help there can be steeply diminishing returns
"}],[{"start":283.85,"text":"“What fraction of the value you currently deliver could you produce if AI tools became unavailable?”"}],[{"start":null,"text":"
I would guess roughly 80 per cent, which comes out to a 1.25x multiple or 25 per cent boost. Which is nice, as it matches very closely with my estimate of time savings (an extra week per month is roughly 25 per cent), even though it’s a very different question
"}],[{"start":290.25,"text":"Sarah, as someone not routinely in the business of writing code (with AI or otherwise), did you find much useful in this one?"}],[{"start":297.4,"text":"Sarah writes"}],[{"start":298.7,"text":"I applaud METR’s attempts to get people to think more carefully about whether they’re really “10X-ing themselves,” as some people like to boast on social media. But to be honest, John, I think what these attempts really highlight is that — in the end — individual-level productivity isn’t a particularly meaningful metric. At least, not if you’re working inside a company (rather than solo), and certainly not if you’re trying to think about productivity at the macroeconomic level. "}],[{"start":325,"text":"That is because organisations and economies aren’t just groups of individuals all doing their own thing. They are interdependent systems. And that means the value we each generate isn’t always easy for us to see in the round, and doesn’t just depend on us."}],[{"start":340.25,"text":"In the world of software, a new report from Faros, a software development platform, highlights the issue very clearly (thanks to Jason Gorman of Codemanship for flagging it to us). It is based not on self-reports, but on real telemetry data from 22,000 developers on the platform. The report summarises the results thus: “More code. Declining quality. Accelerating incidents.” "}],[{"start":363.05,"text":"The data shows there is indeed a huge boost to productivity at the beginning of the software development process. More code is being generated. More projects are being started. But the subsequent stages of the workflow appear to be slowing down: “As work advances from in progress to review to testing to done, each handoff is a moment when human attention, judgment, and capacity determine what happens next,” the report says. “Across every one of those stages, the time spent is up substantially.” "}],[{"start":392.25,"text":"The result has been a drop in the quality of what makes it through the system. “The code entering production systems is not meeting the bar that engineers once set for themselves,” the report says. “AI-generated code is reaching production, and it is not holding up. Incidents have tripled relative to the low AI adoption baseline.” This could presumably be costly eventually in terms of corporate reputation, not to mention the labour-hours required to fix problems."}],[{"start":418.95,"text":"Last year, I wrote about similar findings in Google Cloud’s DORA research programme (which surveys software engineers and tracks performance metrics). That report found that teams which were already high performers were managing these issues much more successfully. But the Faros report pushes back against that finding: “High-performing engineering organizations are experiencing the same downstream deterioration as everyone else.”"}],[{"start":444.09999999999997,"text":"This is just one window into what’s happening in a sliver of the software development workforce, but I think it’s a useful reality check. Over time, organisations will probably learn how to change workflows in order to cope with the volume and quality issues presented by AI. But for now it’s worth remembering: a technology which boosts the productivity of individuals in one part of a company does not necessarily lead to more value being generated by the organisation overall (and can, in fact, even lead to the opposite)."}],[{"start":475.54999999999995,"text":"Recommended reading"}],[{"start":null,"text":"
This was a fun read from Wired: “Meet the Sad Wives of AI”. (Sarah)
On the Persuasion Substack, Talia Barnes has a thoughtful essay on AI’s impact on authenticity (John)
"}],[{"start":null,"text":"
Recommended newsletters for you
The Lex Newsletter — Lex, our investment column, breaks down the week’s key themes, with analysis by award-winning writers. Sign up here
Working It — Everything you need to get ahead at work, in your inbox every Wednesday. Sign up here
"}],[{"start":483.94999999999993,"text":""}]],"url":"https://audio.ftcn.net.cn/album/a_1778891346_8465.mp3"}