Beyond NVIDIA: What $4 Trillion Doesn’t Buy
How software eats the world again.
On July 9, 2025, NVIDIA became the first public company to cross a $4 trillion market capitalization. On the heels of this milestone, I thought a reflection was in order.
NVIDIA has become the face of the current AI era. Its GPUs power nearly every major model in production today. CUDA provides the glue between hardware and modern machine learning. As an investment, NVIDIA is “selling shovels in a gold rush,” and its market cap has surpassed those of the largest companies in history, a testament to the world’s appetite for compute.
But AI’s future won’t be determined by the same constraints and optimizations that defined its beginning. As we move deeper into the practical deployment of AI systems, we’re confronting new friction points – ones that NVIDIA isn’t necessarily positioned to solve.
From Brute Force to Bottlenecks
The past decade of AI progress has been defined by scale. Larger models, more data, bigger clusters. NVIDIA dominated this phase by offering the best combination of parallel compute throughput, developer tooling, and supply chain control. The CUDA lock-in was part of a tightly managed integration between hardware and software that made it nearly impossible for alternatives to compete.
This scale era is peaking. Nearly every model is trained on similar infrastructure, using similar techniques. The largest buyers of NVIDIA chips today are hyperscalers (read: The MAG 7), who are beginning to design their own silicon to improve unit economics and reduce reliance on external vendors. More fundamentally, brute-force scaling appears to be delivering diminishing returns. (Aside: Today The Information published a piece on Huawei designing a chip to compete with NVIDIA)
What emerges next is central to our thesis: a shift away from sheer scale toward smarter system design where coordination, memory, and algorithmic innovation matter more than raw FLOPs.
Coordination, Memory, and Algorithms
As the megalabs continue scaling and competing requires insatiable amounts of money, attention is shifting to three undercapitalized frontiers: coordination, memory, and algorithmic innovation. Each addresses a different structural limitation of today’s AI architecture.
Coordination refers to the increasingly difficult task of orchestrating multiple systems – clusters of models, chains of tools, fleets of agents. Unlike the centralized training paradigm that characterized early LLMs, modern AI applications are becoming distributed. They involve many models, often specialized, working together across asynchronous timelines. This introduces new coordination overhead: latency, context loss, failure modes. These aren’t solved by adding more FLOPs.
Memory is the ability to persist, retrieve, and compose past context across sessions and tasks. Most current models are stateless. They operate in a fixed window, forgetting everything between calls. Building truly agentic systems will require persistent memory that can be scoped, verified, and reused across workflows. This is more than just a data storage issue. It requires deliberate architecture design.
Algorithmic innovation is perhaps the least celebrated but arguably the most impactful axis of change. Recent advances – like AlphaEvolve, RWKV, and other sparse or recurrent architectures – suggest that smarter models can outperform larger ones when optimized for task specificity and training efficiency. Synthetic data, curriculum learning, reinforcement learning, and new forms of reasoning all represent ways to improve capability without expanding infrastructure. These advances attack the assumption that size is the only route to better intelligence. It’s not just about building a bigger brain, it's also about getting many brains to work together. In Nature we see swarm or hive intelligence working in insects, and in organizations we observe team work where experts in specific areas are assigned tasks.
Nazaré believes these frontiers represent the next investable surface in AI. Each addresses a core bottleneck left unresolved by the scale era. Each reshapes the stack in ways that deprioritize vertical hardware-software integration, opening space for new design primitives.
Decentralized Algorithms
As AI systems scale, they encounter new forms of friction. The need to coordinate across distributed models, preserve memory beyond a single call, and improve performance without escalating infrastructure costs has created pressure to rethink core design assumptions. One response has been to decentralize, not as a philosophical gesture but as a practical architecture for meeting these emerging constraints.
TinyCorp, founded by George Hotz, designs AI infrastructure with simplicity as the organizing principle. Its neural network framework, tinygrad, reduces complexity by limiting the operation set and abstracting away hardware-specific dependencies. This enables support for a wide range of accelerators with minimal engineering overhead. Features such as lazy evaluation, dynamic compilation, and kernel fusion provide meaningful performance improvements without relying on proprietary toolchains. Paired with its cost-effective hardware platform (tinybox), TinyCorp demonstrates how algorithmic innovation can lower both the barrier to entry and the cost of execution.
Prime Intellect has built an integrated system for decentralized AI execution. Training is distributed across networks of heterogeneous nodes. Inference is designed to run on-device or near-edge, reducing latency and central dependency. Reinforcement learning modules operate in environments without centralized oversight, and synthetic data generation complements training when access to large proprietary datasets is not feasible. The company has released production-grade models and continues to attract technical and financial validation for its ability to match performance with architectural flexibility.
Nous Research advances the case for open, community-driven model development. Its infrastructure supports multi-node training, modular model design, and efficient fine-tuning methods. These efforts show that high-quality model development does not require hyperscale infrastructure when software systems are optimized for coordination and reuse. Nous has contributed not only models, but also techniques for aligning them more efficiently across varied training environments.
Intelligent Internet proposes a broad, decentralized framework that spans data ownership, model training, and economic coordination. Built on the premise that data sovereignty and incentive design are integral to scalable intelligence, it seeks to build a fully distributed AI network that assigns value to participation and performance. This includes mechanisms for routing, learning, and governance that operate independently of centralized control. The project is ambitious in scope and grounded in the belief that coordination and computation must evolve together.
These projects reflect a broader shift in AI system design. Distributed training expands access to compute. Local inference reduces latency and dependency. Reinforcement learning adapts more effectively in complex environments when freed from centralized constraints. Synthetic data increases the adaptability of models to new domains.
Together, these approaches offer infrastructure designed around coordination, persistence, and efficiency. They do not aim to replicate the centralized training stack. They replace it with something structurally different. The early results are already visible. Performance gains are emerging not from more of the same, but from architectures better suited to the shape of the problem.
The Sun Microsystems Analogy
Once again, I’ve seen this play out before. At the height of the dot-com boom, I joined Sun Microsystems after its acquisition of Infrasearch. Sun, like NVIDIA today, was the default infrastructure provider for the internet’s first major growth phase. It sold high-margin, vertically integrated servers to a small set of high-spending customers. Java, Solaris, and SPARC were household names in computing.
But Sun failed to adapt. Open source software, commodity hardware, and new deployment patterns disrupted its core business. Investors were slow to accept this change, but once the dot-com bubble burst, the search for lower-cost, more modular alternatives accelerated. Clusters of AMD machines running Linux and PHP quietly outperformed their heavyweight predecessors. The shift was architectural and irreversible.
Let’s be clear: NVIDIA is not Sun. But it may be closer than it appears. Its margins, market share, and investor narrative all rely on assumptions built during the scale era.
If coordination, memory, and algorithmic efficiency emerge as the next bottlenecks, the locus of innovation will shift away from monolithic GPU clusters and toward more distributed, heterogeneous systems. Financial shocks, such as rising energy costs, trade instability, or a slowdown in AI ROI, could accelerate this turn.
Next-Level Intelligence
We are at the beginning of AI’s next phase. Investor enthusiasm still overwhelmingly rewards throughput, volume, and lock-in, but this is what I call “fighting the last war.” These were the right heuristics for the first wave. They are insufficient for what comes next.
The systems being built now will look more like networks than pipelines. They will learn continuously, not episodically. They will be optimized for outcome, not benchmark. And they will demand new primitives: memory that lives outside the model, agents that can self-coordinate, and algorithms that squeeze more from less.
Much of this stack remains undefined. That’s precisely the opportunity.
As algorithmic innovation starts to outrun hardware progression, the advantage shifts to those building leaner, smarter, more adaptable systems. That advantage may not belong to today’s largest incumbents. It may belong to a new generation of builders who understand that the future of AI won’t be decided by how big your cluster is, but by how well your systems think together.






