Robot Wave 🤖🌊

I promise not to use AI to generate this newsletter

Oct 23, 2024

I’m resurrecting this substack after many years. In the past I wrote about decentralized privacy which is still a big interest of mine with my ongoing work with Orchid Labs, but in this newsletter I will be writing about my recent journeys in AI.

My journey in AI began in 1993 when I started a PhD researching neural networks for speech recognition. I was fascinated by the idea of creating a machine that could think and help us with our lives. After graduating I worked with NASA Ames Research Group in Mountain View for 3 years developing AI models for data such as space junk, wind tunnel modeling and Mars rover data analysis.

In 2000 I moved on from AI after realizing that the majority of the industry was focused on “mining humans" through the use of ad tech. My work with Orchid Labs has been an attempt to counter the rise of surveillance capitalism, often powered by AI models.

In 2020 I began to notice some early signs of hope in the use of AI for different purposes. The seminal transformer paper had led to generative models such as stable diffusion and chatGPT. The world of AI had suddenly become exciting again and I became fascinated. Earlier this year I started a new fund, Nazaré Ventures to focus on AI. On a personal level, I’m concerned about a world where a few large entities control the training and execution of the most powerful AI systems to ever be created.

This newsletter is intended to be a helpful summary and guide to the things I’m seeing in AI recently and as a conversation starter. I consume vast amounts of news every week on X and other channels and talk to many founders and investors in the space. I hope sharing things here is helpful to you. Please send me your feedback!

Research topic of the week : Distributed Training

Neural networks have never been “fast” or easy to train. My PhD focused on fast training for speech models in the 90s. In the last 10 years the state of the art has moved rapidly for Large Language Models, largely based on the transformer architecture. By combining more data and more compute, incredible scaling of performance has occurred. This has led to an insatiable demand for more data and more compute leading to $1bn seed rounds.

One of my current interests is how to scale AI models, data and compute. Of course I’m not alone in this, and many startups and researchers have great ideas and I am confident we will see more rapid advances in the next few years. One fascinating development in the last few months has been the emergence of training algorithms which work across a distributed or decentralized network of compute.

As background, in simple terms modern generative AI systems have 3 main components:

training data: huge volumes of text / labeled image or video data which provides the raw material to “teach” the connections in the neural network models how to generate text, image or video.
hardware: large collections of specialized processing machines typically using the highest end GPU chips (e.g. Nvidia H100s)
models: highly interconnected neural networks or combination of networks using recurrent and attention architecture containing billions of weights, e.g. ChatGPT has 175 billion weights.

As a result of the evolution of the state of the art AI systems, there are perhaps 10 entities in the world that can train “Frontier models”, e.g. ChatGPT, Claude due to the combined complexity of having access to training data, software to train the models and most importantly financial resources. Elon Musk’s “X.ai” or “Grok” was reportedly trained using 100,000 H100 GPUs, with a total investment cost of $3bn and annual running cost of $100m.

How would an AI researcher not working at the large AI companies participate in this world? How can AI researchers develop real open source training algorithms?

The rise of open source AI models such as Llama and Mixtral in which the weights of the AI models are made freely available, has catalyzed the open source research community. Whilst these models are not truly open source - for example neither the training data nor the code to train the models is released. Platforms like HuggingFace also enable researchers to collaborate on the latest models and datasets.

The architecture and training algorithms of standard LLMs have specific requirements. During training, there needs to be extremely fast bandwidth connectivity and networking designs to support fast node to node communication.

In terms of hardware we now have many examples of >100k GPUs in decentralized clusters such as Vast.ai, and various crypto incentivized platforms

One of the first to focus on decentralized training was Gensyn and Jeff Amico wrote a post recently “Why the Future of Training is Decentralized”. Nous Research recently announced the preview of its decentralized training platform “DisTro” and Prime Intellect is performing a decentralized training run of a 10B parameter model. Many of the approaches are inspired by the DiLoCo paper.

Application of the week : Google Notebooks

A surprise release from a small team at Google is Google Notebooks. This LLM based tool allows you to link documents or URLs to create a resource you can have a text conversation with. In addition a podcast can be made from the resource and 2 speakers will tell you about the content in a chatty style. In this viral podcast, someone gave the Notebook an article about AIs realizing they are not human and the results were pretty amusing. Here is a Notebook I generated with information about this week’s topic of decentralized training.

Nazaré Ventures

Finally, an introduction to my new project: Nazaré Ventures. Nazaré is a small town north of Lisbon. Most of the year there are no waves at all for surfing. Once a year the Atlantic Ocean throws a massive swell in just the right direction to hit the underwater canyon and generate the largest surfable waves in history.

I’m a good surfer but not this good. What I am good at is surfing waves of innovation. I believe the new wave of AI is a huge one which will be transformative to all aspects. Nazaré Ventures is an early stage venture fund focused on AI infrastructure.

For the last 11 years I’ve been focused on investing in and building companies focused on decentralized technologies, starting with Bitcoin in 2013 at Pantera Capital and then Ethereum with Orchid Labs in 2017 and many more over the last few years as an angel investor.

My academic background in AI includes my PhD research in Mixtures of Experts & Recurrent Neural Networks for Speech Recognition from Cambridge (1997). In a future post I’ll make a list of downloadable pdfs for the historically curious. A list of my work is available here.

Nazaré Ventures represents a culmination of my work since 2013, combined with my original studies in AI. If you’re interested in learning more please get in touch. Nazaré Ventures Fund I is focused on early stage investments in AI infrastructure. Our thesis is that a combination of existing and new technologies will reinvent AI infrastructure and enable more cost effective applications to be built. In addition, we believe the use of decentralized and novel infrastructure will enable new kinds of applications to be built and open up new markets.

I will be traveling and attending a number of events in the next few months. Please reach out if you are in any of the locations at the same time.

October 20- 25 : San Francisco : TED AI (here now!)

October 28 - 30 : London

November 9-10 : Lisbon Crypto/AI conference

November 11-14 : Lisbon WebSummit

December 10-12: Vancouver Neurips Conference

Thank you

Steven Waterhouse

https://linkedin.com/in/deseventral

https://nazare.io

https://x.com/deseventral

Robot Wave 🤖🌊

I promise not to use AI to generate this newsletter

Research topic of the week : Distributed Training

Application of the week : Google Notebooks

Nazaré Ventures

Ready for more?