Dear friend,
Welcome to another edition of Robot Wave, where I explore trends in AI.
This has been a big week for AI. Not only is 2025 shaping up to be the year of the agents, but it may also be the year that we Make AI Cheap Again (MACHA)?
Late last year a Chinese company Deepseek released a new model which claimed to use significantly less compute to train than similar models from OpenAI and Anthropic. The latest release on 20th Jan 2025 claims to have similar results to ChatGPT o1 models, again with significantly smaller compute for training. So what’s going on?
Let’s first rewind a bit. I first introduced the concept of MACHA a year ago. What struck me at the time was how the entire AI industry was moving in one very similar direction - just add more compute, more data and of course more $ to the same architecture. The concept of innovation on the model (an area I was trained in through my research in the 90s) was mostly ignored in favor of more compute and more data (as an side note, my PhD was focused on scaling neural network training using Mixtures of Experts). The concept of “scaling law”, where to get better performace, more data and more weights / compute are added to a system has resulted in fantastic performance such as OpenAI and Anthropic but also fantastical investments ($1bn seed round for Ilya’s SSI company, XAI 100,000 H100 cluster etc).
“The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.”
The entire enterprise seems akin to a modern data AI Tower of Babel - a literal race to the top to meet God (AGI).
““Come, let us build ourselves a city and a tower with its top in the heavens, and let us make a name for ourselves; otherwise we shall be scattered abroad upon the face of the whole earth." The Lord came down to see the city and the tower, which mortals had built. And the Lord said, "Look, they are one people, and they have all one language, and this is only the beginning of what they will do; nothing that they propose to do will now be impossible for them.”— Genesis 11:1–9
Commodity hardware and software eats the world
Back in the year 2000, after finishing 3 years of consulting work at NASA Ames and exploring dotcom ideas, I joined a small company called Infrasearch which was started by some of the early developers of Gnutella, an open source file sharing tool. Infrasearch developed search technology based on the ideas of Gnutella and new XMLRPC technologies. We tried to raised venture funding and even though our seed investors included Marc Andreessen, we ultimately were acquired by Sun Microsystems in 2001.
At Sun I initially ran the P2P research team, JXTA and then started a new project focused on archival storage using clusters of commodity AMD servers and commodity hard drives connected using self repairing and managing software.
What’s the connection to today you may ask? Well at the time Sun made the majority of its money from selling large expensive servers running closed source operating system software. Sun made most of its money from a small number of customers and had been extremely successful in the dotcom boom along with Oracle and EMC. During the dotcom boom I had been working on open source web and operating systems such as linux, mysql and php running on commodity servers. These approaches were often eschewed by investors and the startup MBA tourists who ran many of the dotcoms. In the end the scrappy commodity hardware and software designs were the winners. The impetus for the change, however, was a financial shock in the form of the dotcom bubble bursting. This moment led IT managers to question the cost of the Sun and other machines and instead try machines from Dell or other players using Linux and the open source web stacks. A similar event occurred after the 2008 GFC leading to a move to cloud computing away from self hosted infrastructure.
Deepseek
I believe a similar event is happening now. Deepseek has blown open the door to a Pandora’s box of open source AI technologies which was first opened by the “leak” by Meta of Llama, the open source LLM in 2023.
Our investment in Vast tracked this change last year. Vast provides mostly consumer grade Nvidia chips in a GPU rental market. Vast is growing rapidly and while most applications are focused on inference, new technologies such as Deepseek and also distributed training systems such as Nous Research, PrimeIntellect (Nazare portco) and Gensyn may end up disrupting the current training paradigm. This well written piece presents the hardware and software case against Nvidia (more), whilst Musk and others are questioning whether Deepseek was really trained using so few GPUs. The code is open source and some researchers are already reproducing the results.
Deepseek combines 2 things we are already familiar with (see this review also)
Mixtures of Experts (MOE) architectures in which sections of the billions of weights are turned on and off during training and inference for efficiency,
Quantization: using KV cache compression to make the training and inference more efficient.
The final innovation is to combine these first 2 optimizations with the Chain Of Thought reasoning of OpenAI o1 and a Reinforcement Learning approach. This final optimization seems to be the technique that results in the reported dramatic ($5.6m) reduction in cost of training (compare this to the 100k H100s of XAI).
Naturally people are skeptical, and many researchers are attempting to reproduce the results. Some people are suggesting its a Chinese psyop and China has possession of H100s, thus violating the US sanctions.
I am in the camp that this is a real innovation not a fake one. However it plays out, the genie is out of the bottle that the perception is that we can do more with less - we can train and execute large models using less hardware, potentially commodity hardware and without the incredible high cost of investment in hardware infrastructure.
What’s next?
Do the massive clusters of Nvidia’s most powerful chips now become irrelevant? Not necessarily. Even as the LLM market changes dramatically with less reliance on compute, the quests for ASI and AGI – humanity’s Babylonian pursuit of finding “God” – will likely still be dominated by throwing maximal resources at the challenge. Over time, however, I expect that software improvements will keep strong moats from developing in these areas just as we’re beginning to see happen with LLMs. This is now a brave new world. What was taken for granted is being questioned, prior assumptions are being challenged, and new possibilities abound.
Till next time,
Steven
Dr. Steven Waterhouse
Founder and General Partner,






