Learning Machines All The Way Down
Why AI is forcing companies and institutions to learn how to learn.
Through all of these “AI layoffs” announcements and “jobs apocalypse” memes, one thing is clear: organizations are reorganizing themselves to accommodate the arrival of AI.
They are remodeling themselves to look like what is driving the change in the first place.
AI is itself a learning machine, it spawned its own adaptive user (agents), and the smartest way to remake companies and institutions is to rebuild them as AI-powered learning machines too.
This unifies two arguments from earlier in the Robot Wave arc:
Models Aren’t Moats: enduring AI-native businesses are the ones building specialized intelligence around the model, compounding their value as the frontier improves beneath them.
The AI-Enhanced Operator: humans who understand AI are the smallest indivisible piece of specialized intelligence. These humans are thus in high demand but in short supply because mastering AI is impossible (the target never stops moving) and requires a unique temperament for working permanently at the edge.
Building enduring AI-native businesses means finding AI-Enhanced Operators and giving them as much leverage as possible to drive business impact. It also means structuring the organization itself as a domain-specific AI-powered learning machine.
The Machine Before the Machine
You might say companies have always, in some sense, been learning machines. Capitalism and market dynamics are self-correcting mechanisms designed to respond to a reward function.
Take it from Markie Wagner, who herself is building a self-proclaimed learning machine:
“Capitalism is organizational evolution. Millions of businesses compete in the marketplace with offerings that they think customers will want. Some thrive and grow. Others die. Each company evolves, too. People come and go. An experiment becomes a process, a process becomes a web of tacit knowledge. Products are introduced, and products are retired.”
Citadel, for example, has been one of the most ruthlessly successful learning machines for the better part of thirty years, growing into one of the biggest hedge funds in the world, now managing $68 billion (more than $19 billion of it Griffin’s and his colleagues’ own money) and delivering a 19% average annual net gain since 1990 with only two losing years. By net dollar gains since inception, LCH Investments ranks it the most profitable hedge fund in history.
The fund’s durable advantage is the apparatus it has built to improve as it learns over time. The New Yorker describes Griffin’s mission as updating the firm’s strategies and infrastructure so relentlessly that its edge holds for decades.
“His mission has always been different: to build finance businesses that update their strategies and infrastructure so relentlessly that they beat rivals not just today but over decades. Paradoxically, maintaining a consistent edge requires constant, unsentimental internal change—of processes, technology, and people. Citadel takes ideas that are just beginning to circulate and improves them, with math or technology or data that others haven’t thought to use.”
This famously extends to the firm’s culture and org chart, too. Although not the first to pioneer the “pod” structure, Citadel helped make pods one of the defining organizational forms of modern equity investing: small, sector-specialist teams operating within a centralized system for risk and capital allocation. Each team operates autonomously and is judged on performance. Top performers get more money and underperformers get cut.
Now, a learning machine is only as good as its reward function, and finance happens to have a great one: profit and loss. Implicit within that reward function is a common unit by which everything is measured: the dollar. The dollar collapses heterogeneous, incommensurable outcomes onto a single ordered axis. It’s fungible, divisible, transferable, and exogenous. The dollar is why P&L works as a reward function. Citadel’s efficiency as a learning machine thus comes from three things:
A verifiable reward function: P&L within a given window. The environment hands back unambiguous, fast feedback, in a unit that lets you compare every desk, strategy, and trader on one axis. The signal is clean and strong.
Ken Griffin & elite talent: An elite operator orchestrating the machine with the judgment to act well on the signal, and elite individual managers improving the whole.
The organization: a collective designed by the operator, built to run iterative loops against the reward function: allocate more capital to what works and cut what doesn’t. The gains compound. A good reward function is wasted without a human to build a structure that properly learns from it.
Citadel’s useful as an example because it shows how successful a high-performing learning machine can be when it satisfies three core criteria: a good reward function, an operator(s) willing to act on the signal, and an organization built to metabolize feedback.
AI makes that structure more broadly available, and makes becoming a good learning machine more valuable. By making parts of the learning process cheaper and faster to run at scale, it lowers the cost of the basic learning loop: try something, observe what happens, compare it against the goal, and adjust.
Signals and Sensibility
But more experimentation doesn’t necessarily mean more learning. A system only learns when it knows what good looks like. That was hard before AI, and it will remain hard after AI.
AI is often strongest where the reward is verifiable: math, code, games, formal logic, etc. LayerLens has chronicled how long games have served as a proving ground for machine intelligence. (Disclosure: LayerLens is a Nazaré Ventures portfolio company.) Reasoning models improved in part because reinforcement learning from verifiable rewards (RLVR) satisfied this condition: output was evaluated against answers known to be correct.
The same dynamic also explains some of the “jagged frontier” that makes AI more difficult to understand. AI doesn’t improve evenly across everything humans consider difficult because it improves fastest where attempts produce feedback the system can actually use.
Anthropic and OpenAI know this, too, which is why they’ve both optimized for programming with Claude Code and Codex. Code follows a set of observable, verifiable rules, meaning AI can do it well and improve extremely rapidly. It should come as no surprise that coding harnesses have become the first major commercial market for agentic AI.
The broader problem is that AI applied to the world writ large (especially to work and jobs) doesn’t work as well as it does in software. Real life doesn’t have a “common unit to compare outcomes.” But that hasn’t stopped folks from trying to find one, though, as Markie Wagner articulated in the same essay as above:
“The promise of AI is that it will turn businesses into software so that they can evolve over millions of tiny iterations. Beautiful, ideal, complex things can only emerge as the result of tremendous trial and error over time.”
Applied AI works best when the job in question can be made legible to software. “Turning businesses into software” thus means identifying which parts of the work can be made legible, then structuring them so AI can accelerate the learning loop: attempt possibilities at scale, observe, compare outcomes, and improve.
But this, it turns out, is hard: most jobs aren’t naturally legible to software.
As we’ve written before, the frontier labs Iknow this, which explains why they feel required to invest hundreds of millions of dollars into “enterprise services companies” embedding “FDEs” into businesses to teach folks how to use their products.
Although it might be clear that AI is going to transform many, many things, it is not yet clear how. Take the tokenmaxxing frenzy, for example.
Tokenmaxxing was an early attempt to invent a common unit, and it failed because it confused usage with value. Tokens are easy to count, easy to compare, and widely available (did someone say commodity?), which made them tempting as a proxy for productivity.
But tokenmaxxing only tracked consumption, not what that consumption represents. It lacked any causal relationship to productivity, value, or material business impact, making it useless.
This is my whole point in Code Isn’t a Coup: the models are outstanding (and improving), but we’re a long way from life-or-death outcomes because of the models themselves. AI is, at the moment, a paradigm-shifting tool. That tool definitely warrants reorganization around it, but it’s not yet an outcome in and of itself.
What Good Looks Like
Which brings us back to humans. Contrary to the headlines and popular belief, we’ll need a lot of them, because although AI can do a growing share of the work downstream of a reward signal, it can’t do what matters most.
Tokenmaxxing failed because it measured usage instead of value. The important human role is defining the value side of the equation: what’s the business logic? How do we measure it? What are good representative proxies? How might we make parts of what we do legible to this incredible new tool?
Finance had its common unit handed to it. Most domains do not, so the operator has to build one: a local, defensible proxy that stands in for value where no universal measure exists. AI cannot build that proxy for you because the system optimizes only the unit it is given.
That kind of definition is useless if it arrives after the fact. All the hallmark buzzwords of the moment (taste, judgment, vision, etc.) have to inform the loop before AI starts generating outputs, because the system will improve whatever the process teaches it.
People who understand AI, their business, and their customers have to be closer to the design of the loop itself. This is the peak application of the AI-enhanced operator’s skillset.
Because AI creates new kinds of work before anyone knows how to evaluate them, the operator embraces ambiguity and discovers what good AI-powered work looks like by doing it, compounding their own learning in the process. They author and audit the reward function, then evaluate the result and repeat.
Unfortunately, too few people can do this well at the moment, which is why everyone’s reaching for the word “agency.” Credentials and experience matter less than temperament: curiosity, tolerance for discomfort, strong opinions loosely held, and the willingness to constantly update your worldview. Identifying these rare individuals is where AI-native value begins to compound.
New Form of the Firm
Rem Koning recently published a study on what makes a firm “AI-Native.” He distinguishes between what he calls process and product channels. In his words, the process involves using agents like Claude or ChatGPT to move faster with a lean team, whereas the product channel sells AI that does work a human used to do.
According to Koning, process changes don’t reorganize the firm nearly as much as product changes do, because the economics of building a company around an AI product require a fundamentally different structure. He concludes that AI startups are smaller, flatter (half a layer fewer), more engineer-heavy, more senior, have fewer managers, and are more efficient per employee (by valuation).
[Aside: Koning also asserts that smaller firms do not mean fewer jobs. There’s a surge of new-firm entry, meaning that the increasing number of smaller firms balances the fewer number of jobs per firm.]
Practically, this means that companies earnestly trying to prepare themselves to be “AI-native” now have to do three things at once: identify their “operators,” give them more leverage, and rebuild the organization so it can learn from both endogenous and exogenous change.
Inside the business, more work has to become legible to software. Outside the business, the company has to keep adjusting to the frontier of new models and emerging agent capabilities. Customer expectations and constraints keep shifting too. The organization becomes a learning machine when it can absorb both kinds of information in real time and change its behavior accordingly.
The old paradigm organized companies around narrow human roles, then added management layers to coordinate the work between them. AI puts pressure on that arrangement because capable individuals can now do much more, while software handles more coordination.
Jack Dorsey and Roelof Botha articulated as much in a recent memo on hierarchy as an information-routing system. As more information moves through software and more execution moves through AI-mediated systems, some of the routing work that justified managerial layers becomes unnecessary. If the company wants to learn from endogenous and exogenous change quickly enough to react successfully, it has to reduce organizational drag between the people who understand what should change and the systems capable of changing it.
Brian Armstrong was pilloried for rebuilding Coinbase “as an intelligence, with humans around the edge aligning it,” but he’s probably right to do so. He recasts the company as a system that decides and acts on its own, with humans at the boundary where their judgment matters most.
AI expands what capable individuals can do and shifts more execution to software. Both weaken the coordination work that justified the old org chart. People are actually more important in this paradigm because companies will rely more heavily on their operators, but each individual organization will require fewer of them.
Much of the discourse about AI’s impact on jobs misses the point. It’s easy to claim superabundance or apocalypse, but both are extremely reductive. Although there will probably be an uneven and perhaps turbulent transition, AI-native companies need people who can steer the business as the frontier moves. But again, those people are rare.
Just like the industry is moving more quickly than the institutions tasked with regulating it, the systems in place to train the AI-native workers are moving more slowly than the companies trying to hire them.
Mirror Image
Companies, however, are just the beginning: responding first to change because the market forces them to. As we mentioned earlier, capitalism is one of the most efficient meta-learning machines we have.
But society metabolizes change more slowly because humans are at a speed disadvantage: our habits and institutions have long half-lives. The jobs debate is but an expression of what I’ve called the “Too Fast Threshold.” AI will eliminate some jobs and change many more, but the deeper shift is the spread of a kind of as-yet-undefined work wherever AI diffuses. If AI is inevitable, and I firmly believe it is, companies reorganizing as AI-powered learning machines is only the first step.
Like a lot of debates about AI, the direction is clear even if the resulting consequences are not. We will still need humans, but we will need them to do different things. Individuals can adapt for themselves but societies have to build institutions that help more people adapt at scale.
One of our next great challenges as a society is learning to produce enough people who can work within the AI reorganization. If AI changes the way we work, hustle’s not enough. We’ll need systems that help people understand this new technology and learn to work with it.
John F. Kennedy recognized a similar dynamic in 1961, signing a solution into law in ‘62. At the time, automation was changing the labor market faster than many workers could adapt. Rather than just studying the problem, regulating automation, or flat out waiting, his administration decided to train people for the work the economy would need. He is famously quoted as saying:
*“The unemployed whose skills have been rendered obsolete by automation and other technological changes must be equipped with new skills enabling them to become productive members of our society once again.
Large scale unemployment during a recession is bad enough, but large scale unemployment during a period of prosperity would be intolerable.”*
It is safe to say that AI will usher in an unprecedented period of prosperity, at least for those well-positioned to benefit from it. Despite capitalism’s ruthless efficiency, the answer to AI can’t be left entirely to the market if we want this to go well.
But institutions face the same problems the rest of us do. They typically develop systems to train people for existing work, but AI keeps creating and redefining work in real time. For better or for worse, the response probably has to resemble a learning loop too.
Regulation has the same defect. A rule is a fixed answer to a question the world has often already moved past. Legislators write for the conditions in front of them, and by the time a rule takes effect, the conditions have shifted. This is why Kennedy’s instinct holds up. He could have tried to regulate automation directly. Instead, he built a process to keep retraining workers as the labor market turned over, which proved far more resilient than a static rule would have. The regulations that survive a fast frontier work the same way. They build in their own revision, with a schedule for review and a live channel back from the people they govern, so the system can tell whether a rule still does what it was written to do.
I’ve written before that AI conjures the religious imagery of old: the Tower of Babel, Prometheus stealing fire, the Creation of Adam, and God creating humans in his likeness.
But AI may be so uniquely recursive as to complete the circle: we may be building “intelligence” in our likeness, only to find ourselves reorganizing our companies and institutions around its image. Our working lives change to match.















