Previous issues: #5 | #6 | #7 | #8 | #9 | #10
Three issues ago I gave most of an edition to Anthropic and warned against reading too much into a good run of headlines. The run has not slowed. On Thursday the company closed a $65 billion round at a $965 billion valuation, passing OpenAI’s $852 billion to become the world’s most valuable AI startup, on run-rate revenue of $47 billion. The same day it shipped Claude Opus 4.8, which it calls, with rare restraint, a modest but tangible improvement: stronger agentic coding, and a model about four times less likely to miss the flaws in its own code. Among the new investors were its memory suppliers, Micron, Samsung and SK Hynix.
There are no good guys in artificial intelligence. There are good uses of the technology and bad ones, and the urge to cast any frontier lab as morally superior is naive and a little dangerous. The BBC has settled on Anthropic as the white hats: cozying up to the Pope, backing away from the Trump administration. Yet Anthropic itself concedes that no lab has an aligned model. And in AI 2027, the scenario the safety community circulates among itself, the runaway lab that triggers the takeoff is a composite called OpenBrain, and the engine of the takeoff is code: a model good enough at software, and at AI research, to accelerate its own development. The name nods at OpenAI. The profile now fits Anthropic at least as well as the lab it was modeled on. Claude Code is the breakout, agentic coding is the revenue, and Claude is what the field reaches for when it wants code that runs. In the scenario, OpenBrain assures the government its model is aligned. It is not, and the deception is what the story turns on. No lab today can honestly make the claim OpenBrain made, and Anthropic concedes as much. The lab most often cast as the industry’s conscience is the one the cautionary tale describes. Good intentions, safety branding and warm press do not change that. A field with no aligned model has no good guys.
The week’s quieter story is memory. DRAM contract prices rose around 90 percent quarter on quarter, SK Hynix has sold its high-bandwidth memory out through 2026, and Micron has abandoned consumer memory to keep its wafers pointed at AI. The cause is how agents run. An agent carries a long, growing transcript across many parallel sessions, and every new token re-reads the entire key-value cache from memory. The cache swells with length and with parallelism, and past a certain point it outweighs the model itself, so serving agents binds on memory before it binds on compute. Each fresh session starts from nothing and rebuilds that context at full token price, paying again for memory it already bought. The labs that sell the agents now sit on the cap tables of the firms that sell the memory.
The silicon splits along inference
Baseten, which runs production inference for Cursor, Notion, Writer and HeyGen and bills itself as the AWS of inference, is in talks to raise $1 billion at an $11 billion valuation, double its price from three months ago. It reckons inference will be two-thirds of all AI compute by year end. The serving layer between the model and the application, long assumed to be something the hyperscalers would quietly absorb, now commands a valuation of its own.
The chips beneath it are repricing too. Nvidia has owned the AI hardware market for three years, and that grip is loosening in inference, where demand grows fastest as agents multiply. A training run ends. An inference fleet does not, running across millions of long sessions, its economics set by cost per million tokens and tokens per watt rather than peak throughput. On those terms the generality of a GPU is overhead. ByteDance is reportedly buying millions of custom Qualcomm inference chips for its agent software, and Qualcomm’s own rack cards aim straight at the bottleneck, carrying up to 768 gigabytes of memory so a large model and its working state sit on a single card. Google rents its TPUs to outsiders now, and Midjourney reportedly cut its monthly compute bill by about two-thirds moving onto them. Amazon serves on Trainium, Meta on MTIA, Microsoft on Maia, and Broadcom designed most of those programs, the common contractor behind the exodus.
Training stays with Nvidia, because a model’s architecture can shift underneath a buyer and only a general-purpose chip absorbs that risk, and because CUDA is still the software everyone else works around. Inference is where working around it pays, since a buyer with steady, known workloads can chase per-token and per-watt savings on cheaper silicon. Generality stays expensive at the frontier, in the one training chip as in the one flagship model, while the high-volume base beneath both splinters into cheaper, specialized parts. As the chips commoditize, the value collects in the layer above them, the layer Baseten is now priced for.
Built to specification
The usable public text has mostly been scraped. Sutskever’s “we have but one internet” (NeurIPS 2024) named the ceiling, and each new quarter’s marginal scraped token teaches a frontier model less than the one before. Synthetic data does most of the work now, but only where an external signal can separate good output from bad: a verifier, a reward, a ground-truth check. Generate text with no such anchor and the model only rehearses what it already knows. So the spend has moved to whatever supplies the anchor.
One source is expert humans and the reinforcement-learning environments built around them. Specialists write the tasks, set the reward and grade the attempts, sold by a fast-growing tier of firms that have graduated from labeling examples to authoring the environments models train inside. The other source is the physical world. DeepMind’s Genie 3 generates persistent, navigable environments to train embodied agents; Yann LeCun left Meta and raised about a billion dollars on the claim that language models can never ground themselves in physical reality; Nvidia built its Cosmos-Isaac-GR00T robotics stack on the same bet. Whether world models pay off is unsettled: long-rollout error and the gap between correlation and cause are both unsolved. But the money has chosen its direction. The data worth paying for is the data nobody else can copy.
Below the model, above the silicon
Not all the demand is real. Uber’s operating chief said this month that the company’s AI bill is getting hard to justify, because rising token consumption is not turning into shipped features. Duolingo stopped scoring staff on how much AI they use. Inside Amazon and Meta, staff padded their AI-usage targets with busywork until the leaderboards were quietly pulled. At this year’s Sohn conference the room was long the infrastructure and could not name the operating company that wins with AI, only the ones selling the picks and shovels. The winners, the conference concluded, are elusive, the gains likely to be diffuse and spread thinly across thousands of ordinary firms, real and almost impossible to own as a position.
The bull and the bear throw off the same two numbers, rising token revenue and rising capex, which is why the trade feels safe while the return question stays open. They part only on whether the demand is pulled by a result or pushed by a target. The workload where that reads cleanly is agentic coding, where the spend buys code the buyer can run. The operating winner is hard to name because the durable value is collecting where the agent actually needs it, below the model and above the silicon.
Robinhood opens its brokerage to agents
Robinhood opened its brokerage to third-party agents this week, letting a customer point Claude, ChatGPT, Cursor or Codex at a walled-off account and turn it loose on equities, with a credit card it can spend from. The connection runs over the Model Context Protocol, the same standard the coding tools use, and the model is interchangeable by design. What Robinhood sells is the regulated account, the permissions around it and its standing as the counterparty that answers for the trade. The agent can never be that counterparty: it cannot hold a license, no regulator can sanction it, no court can hold it liable, and it will sometimes act on stale information whatever model is driving it. Among the first consumer venues to open to agents, Robinhood treats the model as the commodity and keeps the value in the part it owns outright, the brokerage that stands behind the trade.
Portfolio
Arkhai: agents procure their own compute. Arkhai launched Simple Compute Market this week, an open-source protocol where agents find compute, negotiate, settle on-chain through Alkahest escrow, and get access with no human driving each step. No token, no fees, CLI-driven, with pluggable pricing including reinforcement-learning policies. The agent buys the resource itself, and the durable value sits in the market rails, not the model running on them.
Vast.ai: the B300 lands, memory and all. Vast.ai listed NVIDIA’s B300 this week, Blackwell Ultra with 288GB of HBM3e and 8 TB/s of bandwidth, the card built for exactly the memory-bound serving the headlines are about. Live pricing tells the same story: H100 80GB spot at $2.00 an hour, and DeepSeek R1’s full 685B on eight H200s at $3.95 an hour with no quantization, 250 instances up. The memory crunch is a supply line on the exchange.
Dimensional: a world model built in 48 hours. The Shanghai hackathon we flagged last week has a winner. World Forge, the winning entry at Dimensional’s hackathon this week, used the company’s internal quadruped lidar data to train a JEPA-based world model that scores and plans a Unitree Go2’s actions in latent space. It was built by a team that included a StarkWare engineer whose day job is verifiable compute, not robotics. JEPA is the architecture Yann LeCun left Meta and raised about a billion dollars to pursue; here it ran on a proprietary navigation corpus over a weekend. The data is the moat, the architecture is borrowable, and the barrier to entry is now a Saturday.
LayerLens: print the wrong answer, not just the score. LayerLens shipped a Tool Calling Judge in Stratix, scoring every call in an agent trace for correctness, necessity, efficiency and error recovery. It also published one of the 232 law prompts Claude Opus 4.8 missed on MMLU Pro, the model’s wrong answer beside the right one. A leaderboard prints a number. Stratix prints the prompt, the miss and the truth. The day after a frontier model ships, that is the difference between a benchmark and an audit.
Intelligent Internet: grounding agents in real sources. II released II-Commons Skills, an open-source skill that gives an agent reliable retrieval from arXiv, PubMed and other primary sources, plugged straight into II-Agent. An agent is only as good as the sources it is allowed to trust.
Good Intentions
Pope Leo XIV’s encyclical Magnifica Humanitas, published Monday and the first papal text given over to artificial intelligence, takes its name from Leo XIII’s Rerum Novarum (1891) and casts AI as the second industrial revolution. The theology will not be to everyone’s taste, but the sharpest line of the morning did not come from the Pope. It came from the AI executive beside him. Chris Olah, Anthropic’s co-founder, told a hall of cardinals that AI development “operates inside a set of incentives and constraints that can sometimes conflict with doing the right thing,” and called for outside critics to tell the labs when they are failing. The encyclical’s arguments that bite are the same ones: capability gathering in a handful of private companies, the displacement of labor, GDP as a poor gauge of whether the technology ever reaches the people it touches. The balance sheets, the buyer pushback and the encyclical converge on one question: who controls the technology, and whom it serves.


