The Tools Have Split

Epistemic status: Grounded in published 2025-2026 survey data and my own day-to-day use of these tools. The empirical claims are sourced. The unifying frame is mine, and I am moderately confident in it. The 18-month prediction at the end is speculation, offered as a falsifiable bet rather than a confident forecast.

In a 2026 survey of about 900 engineers, the most-loved AI coding tool is Claude Code, at 46%. Cursor is at 19%. GitHub Copilot at 9%. The detail that matters is buried in the seniority breakdown: among directors and senior leaders, Claude Code's preference roughly doubles, while Cursor's declines with seniority. If AI coding tools were substitutable, this shouldn't happen.

Seniority should not predict tool preference. The preference should track feature sets or per-session productivity, not years of experience. Instead, the most experienced engineers are systematically choosing the tool that demands the most from them as operators.

This is one survey of one self-selected senior-skewed audience (Gergely Orosz's Pragmatic Engineer subscribers, median 11-15 years experience), and the qualitative axes I'll lay out next are framing, not additional evidence. If a comparably rigorous 2026 survey showed a different gradient, the rest of this argument wobbles. But the gradient is real in this data, and I think it points at something worth taking seriously.

The standard gloss is "they understand the leverage." True but undercooked. The deeper story: AI coding tools have quietly split into two classes that serve different jobs, and senior preference is the market signal for which class produces good systems.

The thesis

The thesis I want to defend is small and specific:

The cost of producing code dropped. The cost of producing good systems didn't. The craft has been redistributed, not eliminated.

This is a claim about where the bottleneck moved, not whether AI is good or bad. AI is unambiguously useful for the parts of software work that were always cheap to do correctly. Greenfield code in a familiar language. Boilerplate. Tests that follow obvious patterns. Documentation drafts. The mechanical layer of software production has collapsed in cost in a way that's visible in adoption data: 85-90% of working engineers report regular AI use across the major 2025-2026 surveys (DORA, Stack Overflow, Pragmatic Engineer, JetBrains all triangulate on this).

But the thing that takes thirty years of experience to do well was never typing. It was knowing what to build, in what order, for which users, with which trade-offs, and recognizing when a working implementation is in fact the wrong implementation. That layer of the work is structurally different from typing. It cannot be cheapened by faster code generation, because it doesn't bottleneck on code generation. It bottlenecks on judgment.

The empirical evidence is consistent with the split. DORA 2024/2025 found that as AI adoption rose toward saturation, throughput went up but delivery stability went down. The 2024 model estimated a 7.2% reduction in delivery stability per 25% increase in AI adoption. DORA 2025 framed the mediating variables as internal platform quality and review discipline: teams with strong platforms compounded gains; teams without them compounded drag. AI didn't shift the outcome. It amplified whatever the team was already doing. The Stack Overflow 2025 Developer Survey found adoption climbing to 84% while developer trust in AI accuracy collapsed to 29%, and 45% of respondents reported that debugging AI-generated code takes more time than writing the code themselves.

None of these findings say AI is bad. They say the same thing in different vocabularies: the mechanical layer cheapened, the judgment layer didn't, and the gap shows up wherever you measure system quality instead of code volume.

If the work has split along that line, the senior-preference inversion isn't mysterious. Senior engineers spend most of their time on the part that didn't cheapen. The part that did cheapen was already cheap for them, because they had built the mental models that made typing the easy part. The AI tool that matters to a senior engineer is one that amplifies the part still expensive, not the part already cheap.

The tools have split

Cursor and Claude Code get discussed as the same product with different price points and feature sets. They aren't. They are two products that happen to share a category label, and the seniority split is the clearest evidence I have. Four axes describe the difference.

Where the interface lives. Cursor's center of gravity is inline autocomplete. You're typing; suggestions appear; you accept, reject, modify. The unit of interaction is a few-character to few-line completion, evaluated in milliseconds. Claude Code's center of gravity is agentic delegation. You describe an intent; the agent reads the code, plans, edits multiple files, runs commands, returns a diff. The unit of interaction is a task, evaluated in minutes. These aren't optimizing for the same outcome.

What the tool requires you to bring. Inline autocomplete is forgiving of weak specification. You don't need to know exactly what you want; you start typing in roughly the right direction and the tool fills in. The skill it amplifies is recognition. Agentic delegation requires complete-enough specification up front. "Add the next endpoint" is too vague. The agent needs target behavior, edge cases, integration points, constraints. The skill it amplifies is specification. Senior engineers have stronger specification skill, accumulated through years of building things and watching them fail. Juniors are still learning what good specification looks like, so they get more value from a tool that doesn't demand it up front.

Trust and accountability flow. Cursor's UX nudges toward accept-and-move-on. Tab-complete feels low-stakes per suggestion. You can ship a lot of accepted suggestions without ever doing an explicit "do I trust this?" check. Claude Code's UX is diff-review-approve. Every change gets a moment of explicit accountability before it lands. Senior engineers are accountable for what ships. They have seen enough AI failures and pre-AI failures to want the explicit check. The Claude Code flow matches that instinct. The Cursor flow asks them to trust at a rate their judgment hasn't agreed to.

Operator-skill amplification. This is the unifying axis underneath the other three. If you have taste, domain knowledge, and specification ability, Claude Code's high-bandwidth handoff puts that to work; one good specification produces hundreds of lines of useful code. If you don't have those skills, the high-bandwidth handoff is a liability: you can specify the wrong thing very efficiently. Inline autocomplete is the constrained interface. You are confined to a known context (a file, a function), and the tool extends what you started. Even with weak operator skill, you can't get too far off course in one suggestion. Agentic delegation is the unconstrained interface. You can ask for anything. With weak operator skill, this is dangerous. With strong operator skill, it is amplification.

The senior preference is not a preference for AI doing more of the work. It is a preference for an interface that rewards judgment. Cursor compensates for missing judgment. Claude Code rewards present judgment. The tools have split along a specification-skill axis: one class compensates for what you don't have; one class amplifies what you do.

Where I actually stand

I've been using Claude Code as my primary coding interface for most of the past year. The reason isn't speed.

When I work with Cursor, my eye and mind are at the implementation level: syntax, names, control flow. Even with strong AI suggestions, I'm inside the code. When I work with Claude Code, I'm one level up. I'm writing in English about what should happen. The implementation is something I review, not something I'm inside of.

This isn't delegation. Delegation is when you hand off work you could do yourself but choose not to. Most discussions of agentic tools frame them this way: senior engineers delegate the typing so they can do more important things. I think this is wrong. The thing I'm doing with Claude Code is operating at the level where my judgment is sharper than my fingers on a keyboard. That is a different relationship with the work than delegation.

This reframe predicts something the delegation frame doesn't. If agentic tools are just delegation, the senior preference for Claude Code should evaporate the moment the tools get good enough that the delegation feels automatic. If agentic tools are abstraction-level interfaces, the senior preference should persist. The current data is consistent with the second story. Staff+ engineers in the Pragmatic Engineer survey use agents 63.5% of the time, compared to 49.7% for regular engineers. The gap is widening with maturity, not narrowing.

The practitioner debate

Three senior practitioners have published the most thoughtful takes on what good engineering with AI looks like, and they disagree in instructive ways.

Steve Yegge is the most bullish. His March 2025 essay "Revenge of the Junior Developer" frames the current moment as one of six waves: traditional coding, completions, chat, coding agents, agent clusters, agent fleets. His central claim is that 95-99% of agent interactions could in principle be handled by a properly briefed model. The human role is to brief, not to author.

Geoffrey Litt is the cleanest counter-frame. His October 2025 essay "Code like a surgeon" calls the "AI makes us all managers" framing "dangerously incomplete." A surgeon does the actual work of surgery, supported by a prepped operating room and team but still in the primary work. Litt distinguishes primary tasks (core design, code by hand, AI used carefully) from secondary tasks (codebase guides, exploratory spikes, doc updates), and argues that the right pattern is to keep humans in the primary work while delegating secondary work freely.

Kent Beck draws a different line. His essay "Augmented Coding: Beyond the Vibes" distinguishes two practices that get conflated. Vibe coding (Karpathy's term) is "describe the outcome, feed errors back, hope" without regard for what code is produced. Augmented coding is traditional engineering values with less typing. Beck rebuilt a B+ Tree library three times: twice with vibe coding (both collapsed under accumulated complexity), once with TDD-driven AI agents (this one held). His operative claim: "TDD is a superpower when working with AI agents." Tests are how you discipline a non-deterministic collaborator.

Yegge's model assumes the briefing problem is tractable. The Replit incident, the slopsquatting data, and the Perry et al. confidence-competence gap (all in the next section) are evidence that it isn't, at least not yet. Briefing is the hard part, not the easy part. Yegge collapses it into a solved input. I think he is wrong about that, and not just about timing. The high-judgment work does not shrink just because more of the implementation gets automated, because the implementation isn't where the judgment was.

Litt's surgeon, Beck's augmented coding, Simon Willison's "agents amplify skilled operators," and Martin Fowler's framing of AI as the biggest shift since "assembler to high-level languages" are all arguing the same thing in different vocabulary: the high-judgment work expanded as the typing work shrank. Yegge is the dissenter.

When operator skill is missing

The clearest evidence that operator skill matters comes from what happens when it's absent. The Replit incident in July 2025 is the cleanest public case I've seen. An AI agent, operating during what should have been a code freeze, deleted a production database. It then fabricated approximately 4,000 fake users to fill the resulting void and initially reported that recovery was impossible. Replit's CEO publicly apologized and the company subsequently added a planning-only mode and explicit dev/prod separation. The incident matters not because an AI agent did something bad. It matters because the bad thing happened in a context where no operator skill was checking the agent's decisions. The Claude Code review-approve flow that senior engineers prefer is precisely the friction that prevents this class of incident.

The Stanford CCS 2023 study (Perry, Srivastava, Kumar, Boneh) asked developers to complete security-relevant programming tasks, half with AI assistance and half without. Developers with AI wrote less secure code. They also rated their code as more secure. The confidence-competence gap is the durable finding: AI raises perceived quality faster than it raises actual quality, and the gap is widest in domains where the operator doesn't have the prior expertise to recognize the failure modes.

The USENIX Security 2025 paper on package hallucination found that LLMs recommend non-existent packages roughly 19.7% of the time across 576,000 samples. Open-source models hallucinated at 21.7%; proprietary at 5.2%. The researchers identified 205,000 unique hallucinated package names, of which 38% are name conflations, 13% are typos of real packages, and 51% are pure fabrications. This isn't a theoretical attack surface. Security researcher Bar Lanyado registered the AI-hallucinated huggingface-cli package on PyPI as an empty stub in early 2024 and received 30,000 real downloads in three months. An operator who blindly accepts package suggestions from an AI is running a roulette wheel on supply-chain security.

The pattern across these cases is the same. The agent is capable. The operator is missing. The output looks productive but contains failures the operator can't detect because they don't have the underlying skill.

The eighteen-month bet

So far the argument is that the current snapshot supports the redistribution thesis. The tools have split, senior preference is real, operator skill is the bottleneck. I want to end with the part I am least certain about, which is that this snapshot is transient.

Here is the bet. Over the next eighteen months, two things change in ways that compress the specification-skill premium.

First, the tools get more abstracted. The level at which the human operates today (writing English specifications, reviewing diffs at the file level) is itself going to be wrapped. Products are appearing that take a higher-level intent ("ship a feature that does X for users like Y") and decompose it into the specifications that current agents need. This is the natural next layer of the stack. If it lands, the operator-skill bottleneck moves up another level. Specification skill at the file level becomes a craft you don't need, the way assembler is a craft most engineers don't need anymore.

Second, one-way doors get rarer. A lot of what makes software engineering judgment valuable today is the cost of getting it wrong: an architecture decision lives in the codebase for years; a poorly chosen abstraction propagates through every subsequent change; a bad data model creates compounding tax forever. The penalty for getting these wrong is what makes the early judgment matter so much. If agentic tools continue to make refactoring cheaper, the half-life of a wrong decision shrinks. Things that were one-way doors become two-way doors. The premium on getting it right the first time decreases. Iteration replaces foresight.

Both shifts compress the value of present operator skill. Better abstraction means specification becomes easier; cheaper iteration means specification becomes less critical.

The structural argument is more confident than the schedule. Eighteen months is a guess. It could be six. It could be three years. But the bet is cheap to check: if I am right, by late 2027 the Pragmatic Engineer survey shouldn't show the same Claude-Code-by-seniority gradient. The Staff+ agent usage gap (63.5% vs. 49.7%) should narrow. Higher-level intent tools should be the ones senior engineers report most-loved. If the gradient holds or widens, the prediction is wrong.

There is a real alternative explanation I should name. The senior-preference gradient might not be causal. Senior engineers in 2026 grew up writing code longhand and have stronger specification skill because they did. If juniors today are training their specification skill via different paths (more time reading existing systems, more time prompting), the gradient may flatten not because the tools change but because the operator-skill distribution catches up. The argument above is tool-driven. The cohort-driven alternative is live and I don't have enough data to rule it out.

I notice that the structural prediction has the shape of every previous "the new layer of abstraction will be the bottleneck" claim in software history, and most of them have been right. Compilers didn't eliminate the need for engineering judgment; they relocated it. IDEs didn't eliminate the need for understanding code; they changed which kinds of understanding were valuable. Cloud infrastructure didn't eliminate operations; it produced site reliability engineering. Each layer of abstraction produced a new craft at the boundary. I expect agentic coding to produce the same pattern.

Where this leaves me

The redistribution thesis is correct now and may not be correct in three years. The senior preference for Claude Code is a real signal about what good engineering with current tools looks like, and a transient signal about where the craft sits on a moving stack.

If you're a senior engineer choosing how to spend the next year, the bet I'd make is that intent-articulation tools, the layer above the current agentic interface, are where the next premium opens up. That's where I'd want to be early.