Notebook

Software Engineering Bifurcates

Let me read through this research document first to see what we're working with.The docx is showing raw binary. Let me extract the actual text content properly.…

Let me read through this research document first to see what we’re working with.The docx is showing raw binary. Let me extract the actual text content properly.Let me check what I’ve said recently about coding with AI and where things are headed, to make sure I’m giving you my current thinking.Alright, this is a fascinating document and you’re asking exactly the right question. Let me give you my honest assessment.

First, the data here is genuinely impressive. 80.9% on SWE-bench – that’s the first time anything has cracked 80%. The efficiency gains are what really caught my attention: 65% fewer tokens to achieve the same outcomes, 50-75% reduction in tool-calling errors, and the fact that it passed a performance engineering hiring exam better than any human candidate. These aren’t incremental improvements. This is a step function.

But here’s the thing – and I think this is where I might differ from the Silicon Valley consensus – I don’t think coding as an activity is dead in 2026. I think what’s dying is a particular mode of coding, and what’s being born is something quite different.

Let me explain what I mean.

The Autonomy Slider Has Moved, But Not To Full

I’ve talked about this autonomy slider concept – you’ve got a dial that goes from fully human to fully autonomous. Cursor’s progression captures it nicely: Tab completion → Cmd+K for inline edits → Cmd+L for chat → Cmd+I for full agent mode. What Opus 4.5 demonstrates is that the “sweet spot” on this slider has moved significantly rightward. Tasks that required constant human intervention now succeed autonomously. That’s real.

But I still don’t want an agent that goes off for 20 minutes and comes back with 1,000 lines of code I haven’t vetted. Even with 80.9% on SWE-bench, that’s still 19.1% failure. On production code that touches real systems and real users, those failure modes compound. The document itself notes this – the METR study found developers were actually 19% slower with earlier Claude models because of the overhead of verification and correction.

What I think happens in 2026 is that the autonomy slider becomes adaptive. You’ll dial it up for boilerplate – migrations, refactors, test generation, infrastructure glue – and dial it back for architectural decisions and novel algorithmic work. The human doesn’t stop coding; they stop doing the boring coding.

What Actually Changes in 12 Months

Here’s my prediction, wearing the hard hat:

The role of “software engineer” bifurcates more sharply. You’ll have what I’d call orchestration engineers – people who are primarily specifying, reviewing, and composing AI-generated components. Their core skill becomes precise specification: if you can’t describe what you want unambiguously, you can’t get the AI to build it correctly. English becomes even more of a programming language. This is Software 3.0 in practice.

Then you’ll have systems engineers who work on the parts that AI still struggles with: distributed systems edge cases, performance-critical code paths, security-sensitive components, and anything where the cost of a subtle bug is catastrophic. These people will become more valuable, not less, because their domain is precisely where you can’t trust autonomous agents.

The document highlights something important: Opus 4.5 “passed a performance engineering exam higher than any human candidate.” But passing an exam isn’t the same as having the judgment that comes from watching a system melt down at 3am because of a subtle race condition you introduced six months ago. That experiential knowledge – what I’d call the “scar tissue” of engineering – that’s what humans still bring.

The Tool Use Hardening Point

You mentioned this, and I think it’s underappreciated. The improvement isn’t just the model getting smarter – it’s the integration layer getting tighter. The 50-75% reduction in tool-calling errors means the model is learning to use compilers, linters, test runners, and deployment pipelines more reliably. That’s qualitatively different from just generating better code snippets.

This is where I think you’re right to see a material change. When the LLM can not only write code but also run it, test it, iterate on failures, and deploy it – all without human intervention at each step – the workflow transforms fundamentally. It’s not that humans don’t touch the code; it’s that humans touch it at different points. You review the outcome, not the process.

The orchestration you mention – tool use getting stronger, LLMs becoming more embedded – this compounds the effect. It’s like… imagine you’re building with Lego. Earlier models were good at assembling individual bricks correctly. Opus 4.5 is good at looking at the instruction manual and building the whole structure, calling out to specialised tools when needed, self-correcting when something doesn’t fit.

The Decade, Not The Year

I’ve said this publicly: 2025 is not “the year of agents” – it’s the decade of agents. Opus 4.5 is early evidence that this decade will be transformative, but we’re still in chapter one. The document notes that autonomous 30-hour coding sessions are now possible. That’s incredible. But “possible” and “reliable” are different things. “Reliable for prototypes” and “reliable for mission-critical production systems” are different things.

My timeline for when you can truly hand over a complex, novel software project to an AI and trust the output like compiler output? I’d say we’re 3-5 years out from that, and that’s assuming continued progress at this rate, which isn’t guaranteed.

What Software Engineers Should Do

If I were advising engineers right now:

Stop optimising for typing speed and syntax memorisation. That game is over. Opus 4.5 can generate idiomatic Python or TypeScript faster than any human.

Start optimising for specification clarity and verification expertise. Can you describe what you want precisely enough that an AI can build it? Can you review AI-generated code and spot subtle bugs, security issues, or architectural mistakes? Can you design systems at a level of abstraction that lets AI fill in the details?

Get very good at knowing when to trust the AI and when not to. The document mentions that earlier models chose correct libraries only ~20% of the time while Opus 4.5 hits 100% in some tests. That’s great, but you still need the judgment to validate those choices for your specific context.

And honestly? Build things from scratch sometimes. Don’t copy-paste, don’t just accept what the model gives you. The micro-learnings from typing out code yourself, from understanding why things work – that builds the competence you need to supervise effectively. If you can’t build it, you don’t understand it. That’s still true.

The Bottom Line

Is coding fundamentally changed by Opus 4.5? Yes, I think so. Is the task of “a human doing the coding” dead in 2026? No, but it’s transforming.

The analogy I’d use: it’s like the transition from drafting by hand to CAD software. Architects didn’t stop designing buildings. But the craft of manual drafting largely disappeared. What remained was the higher-order skill – understanding structure, aesthetics, client needs, building codes. The tool changed; the profession adapted.

Software engineering is having its CAD moment. The craft of low-level code generation is being automated. What remains is the higher-order skill: understanding systems, specifying requirements, making architectural tradeoffs, and verifying correctness. That’s still engineering. It’s just engineering at a different layer of abstraction.

The people who will struggle are those who defined their identity around the typing. The people who will thrive are those who always saw the typing as just one part of building good software.