Sometime in the last year, my job changed and I didn't notice.

I was midway through architecting an agent workflow, sketching out which tools the system should have access to, what context it would need at each decision point, what guardrails would prevent it from going off course. I paused and realized I hadn't written a line of what I would have called "code" in three days. No functions. No classes. No algorithms. I'd been describing desired behavior in structured prompts, designing tool interfaces for agents, reviewing AI-generated code for edge cases the model missed. I was debugging why an agent made a bad decision, which is a different kind of debugging than tracing a bad computation. I was writing evals that test behavior, not just correctness.

The questions had changed more than the answers. I used to ask "how do I implement this?" Now I ask "how do I specify this well enough that a system can implement it?" Two years ago I spent most of my time writing algorithms, debugging memory issues, optimizing queries, wiring up endpoints. The artifacts were code files. The skills were syntactic fluency and algorithmic thinking. Now the artifacts are prompts, evaluation criteria, and architecture diagrams. Specification is a different skill than implementation. It requires thinking about a problem from the outside rather than from the inside. You need to be precise about intent in natural language, which turns out to be much harder than being precise in a programming language.

That moment stuck with me. Not because it was dramatic, but because it was quiet. The shift had happened gradually enough that I hadn't registered it. And once I started looking, I found the shift was part of a pattern that goes back decades.

Fifty Years of "That's Not Real Programming"

In the 1970s, you wrote machine code. Literal opcodes. You thought in registers and memory addresses. Compiled languages gave you variables, functions, control flow. Interpreted languages added garbage collection, dynamic typing. Cloud platforms and infrastructure-as-code went further: you declared what you wanted (a load balancer, a database, an autoscaling group) and the platform figured out how to provision it. Serverless computing meant you didn't even think about servers. And now I tell an AI "build me an API endpoint that handles user authentication with JWT tokens" and it writes 200 lines of working code.

At every step, the people at the previous layer looked at the new one and said "that's not real programming." Assembly programmers said it about C. C programmers said it about Python. And now, engineers who grew up writing code by hand say it about prompt-driven development. They've been wrong every time. Not because the old skills didn't matter, but because programming was always about translating intent into working systems. The medium changed. The core activity, closing the gap between what you want and what the machine does, didn't.

But this time, the skeptics might be half-right.

The Best Argument Against My Own Thesis

Here's the thing I've been avoiding: every previous abstraction step maintained a deterministic mapping from specification to execution. C compiles to the same assembly every time. Python produces the same bytecode every time. Infrastructure-as-code provisions the same resources given the same configuration. You could reason about what your code would do. You could trace cause and effect. You could reproduce bugs.

With LLMs, that breaks. Give the same prompt the same input twice and you might get two different outputs. The "compiler" is probabilistic. You literally cannot predict the output from the input with certainty.

The strongest version of the skeptics' argument goes something like this: "What you're describing isn't a higher abstraction level. It's a fundamentally different kind of thing. Every other step on your timeline preserved the ability to reason deterministically about system behavior. You're pointing at a trend line and extrapolating across a discontinuity."

And this argument has real force. My own experience supports parts of it. You can't set a breakpoint in a prompt. You can't step through the execution. When something goes wrong, you're often left staring at an input and an output with no visibility into what happened in between. These aren't missing features that will be added in the next release. They're structural consequences of non-deterministic execution.

So where does the argument go wrong? I think the mistake is in treating determinism as the defining feature of engineering rather than one feature of a particular era of engineering. Distributed systems engineers already work with non-determinism every day: network partitions, message reordering, eventual consistency. You can't predict when a packet will arrive or whether a node will be reachable. Civil engineers work with probabilistic models for earthquake resistance and wind loads. They don't know which earthquake will hit or when. They engineer for distributions of outcomes, not guaranteed ones.

The question isn't whether non-determinism is new to engineering. It isn't. The question is whether the degree of non-determinism in LLM outputs is manageable with existing engineering approaches, or whether it requires genuinely new methods. In distributed systems, we developed tools over decades: consensus protocols, idempotency guarantees, circuit breakers, observability platforms. The equivalent toolkit for probabilistic compilation barely exists yet. We're running evals the way early distributed systems engineers ran manual smoke tests, and hoping for the best.

I think the skeptics are half-right. This is qualitatively different from previous abstraction steps. The determinism boundary is real, and I feel it in my daily work. And I think they're wrong that it isn't engineering. It's engineering that requires new tools and new instincts, and we're still in the early stages of developing both. I don't have full confidence in that position, but that's where I land today.

Prompts Are Code

Prompts are code. Not metaphorically. They have inputs, logic, outputs, and error handling. The difference is the compiler is probabilistic.

Here is an actual system prompt I wrote last month for a code review agent:

You are reviewing a pull request for security vulnerabilities.
Input: the full diff, plus the file-level context for each changed file.
For each finding: state the vulnerability class (OWASP top-10 where applicable),
cite the exact line range, suggest a fix, rate severity 1-5.
If the diff touches authentication or authorization logic, apply strict mode:
flag ANY change that widens access, even if it looks intentional.
If you find zero issues, say so, do not invent problems.

That has input specs, conditional logic, output schema, and an explicit instruction against a known failure mode (hallucinating issues to seem useful). It functions like a function definition. But it compiles differently every time you run it.

This is what makes the work feel genuinely new. When I write a Python function, I can unit test it and be confident the test result will hold for every future invocation with the same input. When I write a prompt like the one above, I test it against 50 inputs and get acceptable results on 47. Is that good enough? For security review, probably not. For first-pass code summarization, probably yes. The engineering judgment isn't about correctness anymore. It's about acceptable variance.

I've come to think this style of engineering is closer to designing distributed systems than to traditional programming. In distributed systems, you expect things to fail. You expect messages to arrive out of order. You build for resilience, not for perfection. You think about "what's the worst that could happen here?" rather than "does this produce the right output?"

The debugging tools for this new kind of code are primitive. When your "code" is a prompt, the context window is your execution environment, and you have limited tools for inspecting what happens inside it. When something goes wrong, you often can't distinguish between a bad specification (your prompt was ambiguous), a bad execution (the model misinterpreted a clear prompt), and a bad evaluation (the output was actually fine and you misjudged it). In traditional programming, those failure modes are distinct and diagnosable. Here they blur together.

This creates a new engineering discipline we don't have good names for yet. It's not prompt engineering in the way that term is usually used (clever tricks for getting better outputs). It's systems engineering applied to probabilistic execution: writing instructions that produce acceptable results across the range of possible executions. That's a hard problem, and I don't think we've developed the right mental models for it yet.

The Meta-Skill Is Knowing When NOT to Go Meta

The most important skill isn't going meta. It's knowing when to stay concrete.

Every layer of abstraction trades control for productivity. Assembly gives you total control. Python gives you less control and more productivity. Prompting gives you almost no control and enormous productivity for the right problems.

The instinct, especially for engineers excited about AI, is to always go up. Why write the code when you can prompt for it? Why write the prompt when you can build a system that generates prompts?

I've watched this instinct go wrong. I saw a team build what they called an "agent orchestration framework." The idea was reasonable: instead of building each agent workflow from scratch, create a framework that could define, compose, and manage arbitrary agent workflows. Three months in, the framework had its own DSL, its own middleware system, its own retry and fallback architecture, its own evaluation pipeline. The engineering was impressive. The framework had been used to build exactly one workflow. The one they'd started with.

When they tried to use it for a second workflow, half the abstractions didn't apply. The DSL couldn't express the control flow they needed. The middleware handled the wrong kinds of preprocessing. They'd built a framework optimized for exactly one use case, the one they had in front of them while building the framework. The generality was an illusion. The test is always the same: does the abstraction reduce effort for real problems that actually exist? If you can't point to at least two or three different use cases the abstraction serves, you're building complexity, not reducing it.

I've made the same mistake. The pull toward abstraction is strong. Building frameworks feels more important than building applications. But complexity isn't value. Solving the problem is value.

The best engineers I've worked with in this new environment share a specific trait: they move fluidly between abstraction levels. They can prompt an AI to scaffold a project, then open the debugger and trace a specific failure through the call stack. They don't have a home level. They pick the level that matches the problem. The worst engineers are the ones stuck at one level, whether that's refusing AI or refusing to go lower. Both failure modes come from the same place: treating a particular abstraction level as an identity rather than as a tool.

The Bootstrap Question

The first compiler was written in assembly. The first C compiler was written in a mix of assembly and earlier versions of C. Each layer of the abstraction timeline was bootstrapped by the layer below it.

AI coding assistants were written by human programmers. Humans wrote the training pipelines, curated the data, designed the architectures, debugged the systems. But that's starting to shift. AI is now helping write better AI coding assistants. The bootstrap loop isn't closed yet, but you can see the arc.

What does engineering become when your tools can improve themselves?

I don't think we're there yet. The improvements are still incremental, still needing human direction, human evaluation, human judgment about what "better" means. But the trajectory is visible, and it creates a strange vertigo when you think about it carefully. Think of a biological analogy: evolution produced brains, and brains produced genetic engineering. The system that created the optimizer is now being modified by the optimizer. We're at an early version of that loop with AI. The difference is the iteration cycle. Evolution took millions of years. We're watching it happen in months.

I want to resist the urge to resolve this. The bootstrap question is genuinely open, and pretending I know where it leads would be dishonest. What I can say is that it's already changing the texture of daily engineering work.

What Does This Mean for Engineering Identity?

If I don't write code anymore, am I still an engineer?

The question sounds melodramatic written out like that. But I've felt it. There are days when I've spent eight hours designing agent workflows, writing evaluation criteria, debugging decision-making failures, and when I stop working some part of my brain says "but you didn't build anything." That's the old definition talking. The one where building means typing code into a text editor and watching it compile.

I used to identify as someone who writes code. Those skills took years to develop. They shaped how I think. And now many of them sit unused on any given day. The new skills are real, but they feel different. Understanding what context a system needs. Specifying behavior precisely in natural language. Evaluating probabilistic outputs. Designing for resilience in non-deterministic systems.

And yet. There's something about the tactile feedback of writing code and watching it run, about the determinism of "I wrote this and it does exactly what I told it to do," that the new way of working doesn't replicate. I don't want to romanticize that. I've spent enough hours debugging null pointer exceptions to know that deterministic code is only deterministic until it isn't.

But I notice the loss, and I think it's worth naming rather than pretending it doesn't exist. Something changed. The new thing might be better in most measurable ways. It's also different in ways that affect how the work feels, and how the work feels is part of what makes it a vocation rather than just a job.

Predictions

I predict that within 5 years, a majority of professional engineers will spend less than 30% of their working time writing code in a traditional IDE. The rest will be specification, evaluation, and architecture work at higher abstraction levels. I base this on the rate of change in my own work: 18 months ago I wrote code roughly 70% of the day. Now it's closer to 40%. If that trajectory continues and is representative, 30% within 5 years is conservative. If code-writing percentages plateau at current levels for 2+ years, I'd revise this. The abstraction trend could have a ceiling I'm not seeing.

Two more predictions I'm willing to stake out:

By 2028, the most common debugging activity for software engineers will be evaluating AI-generated outputs rather than tracing execution paths in code. The core question will shift from "why did this line produce the wrong value?" to "why did this system make the wrong decision?" If debugging still primarily means stepping through code with a debugger in 2028, the abstraction shift is slower than the historical pattern suggests.

Within 3 years, "specification writing" (structured prompts, agent workflows, evaluation criteria) will be a recognized sub-discipline with its own best practices, tooling, and job titles. If it remains informal and ad-hoc, either the probabilistic compilation model is wrong or the field is earlier than I think.

If none of these happen, I've misjudged either the pace or the direction. I'm less certain about the timelines than about the direction. But I'd rather make predictions that can be checked than gesture vaguely at "the future of engineering" without committing to anything specific.

Where This Leaves Me

I'm still learning to be the kind of engineer this moment needs. Some days I lean too far toward the meta, spending hours on architecture that could have been a simple script. Some days I lean too far toward the concrete, hand-writing code that an AI could have produced faster and better. The skill isn't in finding the right level and staying there. It's in the constant adjustment.

Last Tuesday I spent the morning designing a three-agent pipeline for processing research papers. By 2pm I was staring at a stack trace in Python, manually tracing a race condition in the async queue that connected two of the agents. By 4pm I was back in a prompt, rewriting the error-recovery instructions because the agent kept retrying failures that were permanent. Three levels of abstraction in one afternoon. That felt like engineering. That felt like the job now.