Beyond the Boundary Condition
Algorithms, Intelligence, & Real ROI in AI
First of all - if any of you are attending Neurips please let me know. I’ll be there from Dec 2 to 7.
If you’re a regular reader of Robot Wave, you know I firmly believe AI is more than just scaling data and compute. I think NVIDIA is at risk of disruption (Exhibit A & B), the current state of AI is economically untenable, and that algorithms are the new frontier.
Despite the headlines we see nearly every week, investing in more scale is what I’ve called “fighting the last war:” enormously effective (and profitable) for the past ~decade, but ill-suited for what’s next.
This piece will be dedicated to algorithms and our conviction at Nazare that going forward, algorithmic innovation will deliver superior return on investment per dollar spent. Let’s explore why.
To begin, I ended my last piece as follows:
The AI buildout probably leads to some form of AGI/ASI
AGI/ASI explodes the set of possible futures, which means enormous uncertainty
Uncertainty = both larger risks and rewards
No consensus exists on what comes after AGI/ASI
Investors largely avoid extreme uncertainty, because it exceeds their risk tolerance
Investors thus often cluster around consensus
The only current AI consensus is scale (chips, compute, data)
This consensus is now highly concentrated and over-capitalized
Over-investment in scale has already captured most available returns (diminishing marginal returns)
Algorithmic innovation remains critical for continued AI improvement
Cost, performance, efficiency
Algorithms remain underfunded and underexplored
Conclusion: Algorithms represent the next major investment opportunity in AI, occupying the same position that scaling frontier models held in 2022
Capacity, Capability, & Capital Efficiency
As mentioned above (and repeatedly in my writing), for the past decade capital has flowed into capacity expansion: chips, clusters, and now energy. This is where the economics of AI begin to invert.
We’re evolving to a stage where capability expansion will become increasingly more attractive: new architectures, new optimization methods, new algorithms, all of which proliferate independent of scale.
Capability squeezes more out of what currently exists, improving the status quo without requiring more GPUs, data, or energy. This is capital efficiency at scale. Algorithmic improvements compound across fleets, architectures, and workloads, meaning a single breakthrough could deliver significant returns at a fraction of the cost.
I often describe this dynamic as the shift from “how much compute can we deploy” to “how intelligently can we deploy it.” Every time an algorithm reduces the compute requirement of a given capability by even five or ten percent, it has the potential to shift billions of dollars in data-center economics.
In fact, frontier AI labs are faced with a conundrum related to their capital expenditures, and Lynette Bye’s excellent recent Transformer article describes these dynamics in an elegant way. Frontier labs need to decide whether to spend money on…
Pre-Training (which is a scaling question)
Post-Training (reasoning, fine-tuning, and other improvements to already-trained models)
Inference: (running the actual deployed models to serve customers)
Despite the enormous sums of money being raised (and spent), the compute demands on inference make dedicating the necessary compute to new, larger pre-training runs impossible. It goes without saying, but this is why the entire industry is so focused on scale and data center buildout. The claim is that all currently available GPUs are being used, so more need to come online in order to service both inference and pre-training needs.
In the meantime, labs have spent much time, energy, and money on Post-Training. Bye continues in her article: Researchers have found smarter ways to use their limited compute, which produced surprising improvements in models such as Deepseek or Anthropic’s Haiku. The past year has seen the rise of the reasoning paradigm, where companies pour their compute into improving models that have already gone through resource-intensive pre-training. For a time, that has offered comparable performance gains for a tiny fraction of the cost.
One particularly poignant example is Test-Time Compute, an algorithmic innovation that trades inference cost for better performance by having models “think” a bit longer, initially popularized by the release of OpenAI’s o1. Test-time compute is but one example of a distinct category of algorithmic breakthroughs focused on reasoning enhancement, suggesting there’s significant underexplored investment opportunity in reasoning-focused algorithmic innovations beyond traditional scaling approaches. All of which is without mentioning other areas like training efficiency improvements, architecture innovations (which we’ll get to below), optimization methods, and memory or retrieval.
It almost goes without saying, but there is significant room for improvement here. Much of Nazare’s conviction in algorithms is precisely related to the fact that frontier labs cannot do everything with any sort of capital efficiency or effective creativity. We’ve said it before but it bears repeating, these teams are extraordinarily talented and demonstrably effective. They cannot, however, do it all themselves (much though they – and their valuations – would like you to believe they can).
More explicitly: algorithms and the teams building them represent an opportunity for asymmetric returns. Small, dedicated teams could shift capability and/or efficiency frontiers in ways that the frontier labs cannot. (This is also why we believe in open-source and decentralization!)
It’s the AI equivalent of “zero marginal cost” software. Investors have long known the power – and potential for returns – of this dynamic with respect to SaaS, but it’s been ignored thus far for AI. Though efficiency is unpopular at the moment, at some point the bill comes due.
Power & Fragility
Power constraints are just one example of the opportunities that exist for algorithms to capture value. The subtle shift in language measuring data centers in “gigawatts” (energy) as opposed to chip or GPU (compute) counts reveals we’ve entered a regime where energy is the binding constraint. Even with the most determined leaders pushing up development timelines, the lead times for power infrastructure stretch across years in an industry increasingly used to massive improvements in months or weeks, creating natural ceilings on compute-scaling.
This reframes the entire competitive landscape, because at some point one of two things happen:
We hit real, physical obstacles in our infrastructure buildout
The bubble pops, and the thus-far-infinite capital financing the buildout disappears
With respect to the bubble popping, this would be the historical equivalent of “rhyming” with previous tech cycles. The dot-com era bubble burst forced a move from proprietary to commodity hardware, meaning value shifted from vertical providers to cheap, easily-available Linux-based solutions. (I was present at Sun Microsystems for this shift in real time)
Additionally, some have noted that many of today’s current investors have only ever known ZIRP and never really experienced a bubble bursting, because most were not investing at the turn of the 21st century (dot com boom) or even for the GFC in 2008. Nobody present for those moments claims any sort of “badge of honor,” myself included, but having watched markets crash reminds you how fragile they can be and how quickly they can unwind, no matter what the “narrative du jour.”
Either way, eventually the language will shift yet again to account for new optimal metrics, and I anticipate them resembling something like “joules per successful task” and/or “quality per watt.” When this becomes a reality, companies prioritizing efficiency – and the teams building the algorithms that provide it – rather than raw deployment gain sustainable advantages as energy constraints tighten.
Evidence that we’re in an “AI Bubble” abounds, but specifically with respect to the infrastructure buildout, power as the ultimate bottleneck is the strongest indicator that pure compute strategies continuing at their current clip will eventually be unsustainable. We’ll soon need to do more with less (or at least with what already exists).
Algorithms Worth Our Attention…or Not.
We can bucket algorithms into two broad categories: transformer-based and transformer-less. (This is a broad overview, but for those interested in more detail, please feel free to reach out!)
Transformers revolutionized AI by replacing recurrence with attention – a mechanism that allows models to look at every part of an input simultaneously. Transformers use self-attention to compute relationships between all tokens (words, pixels, data points) in parallel. This enables contextual understanding over long ranges and scales extremely well with compute.
Transformer-less algorithms refer to architectures that don’t rely on self-attention as their core computation. Classic deep learning architectures include CNNs for spatial data processing, RNNs/LSTMs/GRUs for sequential modeling, autoencoders for feature learning, and GANs for generation through adversarial training. These approaches excel at handling local patterns and structured data with lower memory requirements, but they face limitations in capturing long-range context and suffer from slower sequential processing during training.
Importantly, none of this means they’re useless. In fact, although they don’t make headlines, there is an emerging “Post-Transformer-Based Model” movement with teams attempting to replace or augment attention-based computation. Despite their impressive quality, transformers do present certain inefficiencies, and the goal here is to maintain or exceed transformer performance with less compute and longer context – doing more with less.
Examples include State Space Models (Mamba, S4, Hyena, RWKV, Liquid AI) that achieve linear-time processing of long sequences through continuous state representations, Recurrent Memory Transformers that combine transformer attention with structured memory mechanisms, Mixture-of-Experts (MoE) systems (indeed, I contributed to this field of research) that dynamically route inputs between specialized sub-models, and Graph Neural Networks designed for relational reasoning over structured data.
These approaches address specific limitations in sequence modeling, memory efficiency, computational scaling, and structured reasoning tasks. Returning readers will understand that algorithms are the critical foundation for a MACHA future in which AI is cheaper, more efficient, more performant, more private, and more useful.
One concession to be made, however, is that this massive buildout of AI infrastructure has been optimized for Transformer-based (attention) models. That means alternative models – though potentially enormously powerful and valuable – may also be at a structural disadvantage given the current hardware. As I mentioned earlier, this is not new and the market can (and will) adapt. Nevertheless, there is progress to be made not only in algorithms themselves, but in what surrounds them as well.
The Vibe Shift & Infra Making Algos Easier to Use
To finish, let’s now consider a few perspectives from leading AI personalities that demonstrate a shift in the status quo. Dwarkesh interviewed Andrej Karpathy recently, so let’s start there. It’s worth a watch, and it’s interesting to note his insistence that algorithms are critical to continued improvement.
The interview illustrates, among other things, how algorithmic foundations often prove to be equally as important to AI breakthroughs over time as raw compute. He specifically notes failed early agent projects that had substantial computing resources but lacked proper representational architectures (algorithmic foundations that language models later provided), along with current generation agents being insufficient for most tasks because of the limitations of current frontier models.
What’s more, he notes how much useless internet slop (both human- and machine-generated!) is in the pretraining data for all of the frontier models, and why we need to filter it out. That filter will almost certainly take the form of a powerful, elegant algorithm with excellent proxies for determining quality and eliminating noise.
Further, he thinks we need better models that actually rely much less on a “memorized” knowledge base and instead act more like humans by digesting information into pattern recognition - dynamically updating our cognitive algorithms, if you will. He expects that this will take improving current models as well as developing complementary algorithms to build out more complete systems of cognition. In his reading, it’s like we’ve got an incomplete brain right now (e.g. a prefrontal cortex but no amygdala). Novel algorithms will help build out these systems and make them more capable.
It’s also apparent to me that simply iterating upon current frontier models is not going to deliver the kinds of AI capabilities the big labs are promising. As impressive as AI capabilities are today, the history of AI is not just one of transformer-based LLMs, and it won’t end with them either. Novel algorithmic development is essential to expand the capabilities of agentic AI.
In other news, Clement Delangue (Hugging Face) recently posted about a “paradigm shift in AI,” inspired by Tinker from Thinking Machines and Nanochat from Karpathy. Delangue isn’t specifically referring to algorithms here, but they form the foundation beneath everything he’s referring to in his post, critical to their success.
The ecosystem itself is starting to move from generalist, centralized models to smaller, specialized, and open-weight experimentation, all of which will require algorithmic improvements to move forward.
Nanochat, Tinker, vLLM, SGL, and Prime Intellect are all examples of lightweight frameworks or algorithmic optimizers that make training, inference, and fine-tuning dramatically cheaper, more performant, or more efficient. The speed of all this is frankly incredible, too. Tools like Tinker and Tinygrad let researchers express fine-tuning loops and distributed training runs in just a few lines of code, democratizing experimentation.
The Bottom Line
From an investment perspective, algorithmic innovation represents an under-appreciated opportunity hiding in plain sight. The majority of capital remains concentrated in hardware, hyperscale infrastructure, and foundation model training.
The more efficient the algorithmic layer becomes, the more exposed those infrastructure valuations are. Efficiency erodes moats, compresses margins, and redistributes power from hardware vendors to software innovators.
What’s more, the next generation of agentic AI will almost certainly depend on novel algorithms that make these systems more capable: more adaptive, more creative, more elegant at problem-solving. Reasoning, test-time compute, architecture diversification, optimization, memory, and retrieval are all areas in which we can improve.
Taken together, we can infer that AI is beginning to explore what it means to be “post-scale,” where infrastructure inputs are no longer the only lever for improvement (and may actually encounter significant obstacles). Algorithms will play an important role in what comes next, and they will drive portfolio returns as well.




