Notebook

Memo to Team

This past week is a real hockey-stick moment in AI, as Mark would say. I have been playing with Opus 4.5 in Claude Code since last night. This iteration is expo…

This past week is a real hockey-stick moment in AI, as Mark would say. I have been playing with Opus 4.5 in Claude Code since last night. This iteration is exponential.

Take note, this is a serious update.

I’m convinced we are observing a phase transition from “code completion” to “autonomous state management.” Opus 4.5 pushes the “vibe coding” horizon, the duration you can code purely via intent, from minutes to hours, effectively shifting our team’s role from writing syntax to managing high-level system interventions. We need to take advantage of the democratisation. The joke will be on us if we do not.

1. Intuition: The “Context” is Now the Code

Think of previous models (Sonnet 3.5, 4, 4.5) like a bright junior developer who is excellent at writing a single function but loses the thread if the call stack gets too deep. They eventually “trip over their own feet” with circular logic or convoluted dependencies. This is why I spend so much time on PRDs and HLDs. Not sure I need to do that as much anymore.

What I am observing as I instruct it, Opus 4.5 behaves more like a senior engineer with a massive, stable working memory. It doesn’t just predict the next token; it maintains the entire application state in its head. The transcript highlights that we are entering the “interventions only” phase of coding. You stop checking the compiler output (the implementation details) and start iterating purely on the requirements (the “vibe”).

2. The Formalism: Key Technical & Architectural Impacts

Here is the extracted data regarding Opus 4.5 that directly impacts our development lifecycle and architectural decisions.

Performance & Autonomy

  • SOTA Benchmarks: Opus 4.5 hits 80.9% on SWE-bench Verified, a significant margin over GPT-5.1 (77.9%) and Sonnet 4.5 (77.2%).

[image: image]

  • The “Intervention-Only” Phase: I think usage will significantly start to shift to an “interventions only phase of coding,” where engineers stop writing code in the IDE entirely. At most just in the Command Line Interface.
  • Extended Autonomy Horizon: The task length barrier is just getting longer and longer. Autonomous task execution now routinely stretches to 20 or 30 minutes without losing coherence.
  • End-to-End Generation: It is capable of “vibe coding” an entire app (eg Next.js) end-to-end without the user touching implementation details.
  • Design Iteration: Unlike previous models that claim a design is “done” prematurely, Opus 4.5 is autonomously iterating until a design is pixel perfect (it is using Playwright).
  • Agents that do Automated Browser Tasks just got stupidly easy to build! The threat to automated browser products is huge.
  • Parallelism: The improved planning capability allows me to work on massive parallel streams. I saw a Tweet this morning where someone claimed they are running 11 different projects in 6 hours successfully. I believe them. This is next level.

Architecture & Tooling (Crucial for Architects)

[image: Screenshot 2025-11-26 at 10.51.28]

  • Tool Search Tool: This one has me so excited but simultaneously so annoyed. I am pretty certain it has made obsolete 50% of my dev work this week. This new capability is allowing the model to search across thousands of tools without us needing to “stuff” every definition into the context window up front. I had solved for this previously with an architecture similar to Agent Skills. Probably pointless work now. This is AI. This is the way.
  • Programmatic Tool Calling: This one is going to actually make Opus 4.5 cheaper than Sonnet 4.5 even though per token prices are more expensive with Opus. It allows Claude to invoke tools directly in a code execution environment, reducing the impact on the context window.
  • Tool Use Examples: This is a universal standard for demonstrating correct tool usage to the agent. This is going to save so much time in context engineering.
  • Agent Integration has 10x increased here. Hands down the “best model in the world” for coding agents and computer use and browser use. AutomatePro, Unify and the like are stuffed if they do not get on the bus. If SN wakes up to this they could turn ATF into a super agent.

Economics & Efficiency

  • Cost Reduction: Input cost dropped to 5/million (from 15) and output to 25/million (from 75) compared to the previous Opus.
  • Efficiency > Rate: Like I said above, Opus is 60% more expensive per token than Sonnet, but it requires 76% fewer reasoning tokens for complex tasks.
  • Total Cost of Task (TCT): Due to high one-shot success rates and token efficiency, the total cost per successful task is often lower than Sonnet 4.5. But it does need to be on complex tasks. There needs to be more model switching. More sub-agent use.

Team Impact

  • Productivity Spike: Internal Anthropic surveys reported a mean self-estimated productivity improvement of 220%.
  • Skill Shift: The hard part of the job is shifting entirely to requirements, goals, feedback, and architecture; “software engineering” as purely writing code may be nearing its end sooner, sooner than I thought or anyone for that matter.

Closing

The gradient is clear: we need to stop optimising for “code completion” speed and start optimising for “agentic orchestration.” The immediate next step for us is to integrate the Tool Search capability to offload context management (in all our agents and apps) and benchmark our “intervention free” duration on current user stories.