Published April 18, 2026
#3: Opus 4.7 Tops Everyone, Claude Code Adds Routines, OpenAI Responds with Cyber Model
This Week's Highlights
On April 16 Anthropic shipped Claude Opus 4.7: 87.6% on SWE-bench Verified (up from 80.8%) and 64.3% on SWE-bench Pro (up from 53.4%). On GDPval-AA — 1,753 Elo vs GPT-5.4 (1,674) and Gemini 3.1 Pro (1,314). Pricing unchanged: $5/$25 per million input/output tokens. New: /ultrareview command in Claude Code, xhigh effort level for deeper thinking, task budgets in public beta, and vision at 3× resolution (up to 2,576px on the long edge).
On April 14 Anthropic rebuilt the Claude Code desktop app: a sidebar with every active and recent session, a drag-and-drop layout, integrated terminal and file editor, and a rebuilt diff viewer for large changesets. Second announcement — Routines: agents that run without an active session, triggered by a schedule, API call, or GitHub event (like a new PR). Available on all paid tiers.
On April 14 OpenAI launched GPT-5.4-Cyber — a fine-tune of the flagship with lowered refusal thresholds on security tasks. It handles binary reverse engineering, vulnerability analysis of compiled code, and defensive tooling. Access runs through the Trusted Access for Cyber (TAC) program — thousands of vetted defenders and hundreds of teams. Anthropic Glasswing has 9 partners.
New Tools
Gemini Robotics-ER 1.6
Google DeepMind shipped an updated reasoning model for robotics. Pointing & counting — 80% (was 61%). Boston Dynamics Spot now reads analog gauges at 98% accuracy (was 23%). If you are building a robot or ML pipeline needing spatial reasoning — available via Gemini API and AI Studio.
DeepmindKimi K2.6 Code Preview
Moonshot AI rolled out K2.6 to all Kimi Code subscribers. 5× cheaper on input ($0.60 vs $3) and 6× cheaper on output ($2.50 vs $15) compared to Claude Sonnet 4.6. Deeper reasoning traces, more reliable tool calls. If Claude Code is eating your budget — here is the alternative.
KimiCloudflare Code Mode MCP Server
An MCP server with aggressive token savings: just two tools — search() and execute(). The model writes JavaScript against a type-aware SDK and runs it in a V8 isolate, avoiding loading all endpoint definitions into context. For large-API integrations — must-try strategy.
InfoqRecipe of the Week
Code Review Agent with Claude Agent SDK
Code review is a perfect task for an AI agent: it requires reading context, finding related files, and applying different criteria to different parts of code. We break down how to build an agent that reviews code like an experienced engineer — prioritizing findings, searching for context, and posting a structured report directly to GitHub.
Useful Link
We Tested Anthropic's Redesigned Claude Code Desktop App and 'Routines' (VentureBeat)
A hands-on review of the new Claude Code desktop app and Routines — with concrete scenarios: how many parallel agents it handles, how Routines behave on failures, what you still need to fix manually. Bookmark if you are rolling this out.
Stat of the Week
$242B — 80% of all global venture capital in Q1 2026 — went to AI companies (Crunchbase). Four of the top five venture rounds in history closed this quarter: OpenAI ($122B), Anthropic ($30B), xAI ($20B), Waymo ($16B). Looks like in 2026 "venture capital" means AI by default.