AI Waves #6: Anthropic gated capability, Nvidia open-sourced a frontier model, and the buffer ran out

Intelligence at the deployable frontier is commoditising

Apr 23, 2026

AI Waves #6 | April 22, 2026 | Nazaré Ventures

Previous issues: #1 | #2 | #3 | #4 | #5

This week’s thesis: the buffer has narrowed.

A month ago Dario Amodei put the open-source and Chinese frontier at six to twelve months behind. Seven days later, that buffer looks a lot thinner. Intelligence at the deployable frontier is commoditising, and the labs are adjusting accordingly.

The model that chose to be less dangerous.

Anthropic’s Claude Opus 4.7 is the first generally available frontier model trained with explicitly reduced offensive capability. Security researchers who need the full set can apply through its new Cyber Verification Program. Two days earlier OpenAI shipped GPT-5.4-Cyber with Trusted Access, same structure. Both closed labs now gate capability by trust level as well as price. The UK AI Safety Institute’s Mythos evaluation published the same week explains why: 73 percent on expert-level capture-the-flag, first model to autonomously complete a 32-step network attack.

Amazon committed $25 billion more to Anthropic in the same window, with $5 billion landing immediately at a $380 billion valuation and five gigawatts of Trainium capacity over ten years. The closed frontier is getting more expensive, and more selective about who gets it at full capability.

The gap that wasn’t.

Two Chinese frontier releases in the same seven days, and Nvidia open-sourced its own frontier model in the same window.

Kimi K2.6 shipped April 20: 1T-parameter MoE, open-source, 58.6 on SWE-Bench Pro, designed for twelve-hour autonomous coding runs. It prices an order of magnitude below Opus 4.7. Qwen3.6-Max-Preview shipped the same day, Alibaba’s closed flagship, 260K context, top-ranked on six of Alibaba’s selected coding benchmarks. Both labs claiming SOTA with different evaluation setups is the messier reality.

Nvidia shipped Nemotron 3 Super into the same window: 120B-total / 12B-active MoE, open weights with ten trillion pretraining tokens, fifteen RL environments, and full training recipes under a permissive licence. The company whose silicon underwrites the AI boom is now a serious open-weight publisher in its own right.

I’m an open-model power user, and although I’m a fan of them, I remain unconvinced any of these three is meaningfully better than Opus 4.7 for the work I actually do. Published benchmarks become training targets, and training targets are not the same thing as capability. When a measure becomes a target, as Goodhart’s Law has it, the measure stops being useful. We have a portfolio company working on this, as you’ll see below.

Open models have tracked the frontier closely since the DeepSeek moment of January 2025. For a growing share of real workloads the available models are already good enough, and once they are good enough, the axis that matters is cost, latency, and reliability. Horace Dediu reports that open-source models now power 80 percent of startups seeking VC funding. That number describes where the builders already are.

Follow the labs’ incentives and a three-tier structure is hard to avoid. Each lab’s most capable model is the one that helps it build the next one, and releasing it gives away the engine of future self-improvement. As the frontier labs approach IPO and need to show durable moats, withholding the top tier stops being a safety decision and becomes a business one. Underneath that, the labs keep shipping mid-tier models through metered APIs because competition forces them to. And underneath that, the open-weight ecosystem keeps compounding on its own machinery. Three tiers, each optimising against different incentives, and each further from the last.

Self-improving AI went strategic.

The defence of the top tier runs through self-improvement.

Sergey Brin is personally leading a DeepMind strike team under Sebastian Borgeaud to close Gemini’s coding gap. Internal memo: “turning our models into primary developers.” Same week, Recursive Superintelligence (four months old, founded by Richard Socher, Tim Rocktäschel, Josh Tobin, Jeff Clune, and Tim Shi) raised $500M+ at $4B pre-money from GV and Nvidia. The thesis is in the name.

The recursive loop improving open weights (distillation, RL against verifiable rewards, cheaper hardware) does not require frontier lab cooperation to keep running. This week a Google co-founder made the loop his personal project, Nvidia put nine figures into a company whose only thesis is the loop, and Nvidia released a frontier-class open-weight model with the RL environments bundled. Nvidia is now funding the loop on one side and distributing the tools to run it on the other.

Agents as architecture.

If self-improvement is the labs’ fight, agents are everyone else’s.

Cloudflare shipped its agent-native AI Platform on April 16. Google Cloud Next 2026 went further six days later: Agent Studio, Agent-to-Agent Orchestration, Agent Registry, Agent Identity, Agent Gateway, Agent Observability, Workspace Studio agents across Gmail and Docs, a TPU 8i tuned for concurrent-agent inference, and a $750 million partner fund. The Identity, Registry, and Gateway primitives are the agent-infrastructure layer that has to exist before multi-agent workloads can be managed at scale. Google standardised them first. The thesis from #5 is now in shipping product.

What We’re Watching

Cursor into the Musk orbit. SpaceX took an option to acquire Cursor by year-end at $60B (or $10B for continued collaboration). xAI (merged with SpaceX in February at $1.25T) is already providing Colossus compute for Cursor’s Composer model. Bloomberg frames it as xAI catching up in coding. The more interesting read is vertical integration of proprietary models: a robotics-specific model for Optimus, driving-specific model for Tesla, rocket-specific model for SpaceX, and X as the data flywheel feeding Cursor’s foundational Composer model which acts as the baseline for the rest. OpenAI shipped GPT-Rosalind for life sciences on April 16, the same move from the horizontal side.

MCP’s first single point of failure. OX Security disclosed a remote code execution flaw in Anthropic’s official MCP SDKs, affecting roughly 200,000 servers including Cursor, VS Code, Windsurf, Claude Code, and Gemini-CLI. Anthropic called it “expected behaviour.” Agent plumbing is becoming critical infrastructure faster than it is being secured. The trust and verification layer is the hard part, and retrofitting it tends to be expensive.

Portfolio

Prime Intellect: Zapier on Lab, FrontierSWE on the Hub. Zapier’s AutomationBench on Prime Intellect’s Lab platform: frontier models under ten percent on real business workflows. FrontierSWE launched on the Environments Hub four days earlier. Prime Intellect’s stack is becoming the default substrate for evaluating the agentic frontier.

Intelligent Internet: AI derives novel physics. II published “The Cosmological Constant Is Positive“ on April 22: Logos, II’s first-principles reasoning system, derives that the sign of Λ is forced by the algebra of spacetime symmetries. In 109 years no physicist has derived Λ’s sign from first principles. Whether the result survives scrutiny or not, it is the first non-trivial theoretical physics claim from an AI reasoning system. The commercial thesis is automated first-principles reasoning: drug discovery, materials, engineering.

Provably: CCS 2026 acceptance for SNARKless verifiable databases. qedb gives verifiable SQL queries with proof size independent of database size, using bilinear pairings rather than general-purpose SNARKs. As agentic systems query private data at ever-larger stakes (the MCP disclosure above makes this concrete), the verifiability layer is moving from academic curiosity to production requirement.

LayerLens: independent evaluation for a benchmark-gaming era. Kimi K2.6, Qwen3.6-Max, Nemotron 3 Super, and Opus 4.7 all claim frontier positions on overlapping SWE-Bench variants with different evaluation setups. LayerLens exists to strip the asterisks off those numbers. Atlas v1 has been fully public since October.

The pattern.

A month ago consensus held that frontier capability was an American duopoly with a twelve-month buffer. Over seven days the buffer narrowed. Open models matched the deployable frontier, Nvidia joined the open side, and the labs’ response patterns became legible: capability gating, self-improvement strike teams, vertical integration. Underneath it all, the open-weight ecosystem keeps compounding on its own machinery. The recursive loop is producing output, and the infrastructure beneath it is what compounds.