The guy’s eyes flicked up at me as I took the seat next to him.
All around us were engineers doing exactly what he was doing — coding.
Without missing a beat and without his fingers slowing down, we started to talk.
He was on his third startup: one collapsed, one IPO-ed, and the one he’s currently working on raised their Series-B from the flashiest names in the industry — and now he found himself at the AI Engineer World’s Fair.
He showed signs of something that I saw many times at the conference — the intertwining of people and technology in ways that are becoming harder and harder to tell apart.
In that interaction, as he held a conversation with me while he seamlessly executed programming tasks (writing functions, debugging, and adding comments) it struck me that — in some ways, he was a human acting like a machine. But what I saw even more of throughout the conference was machines acting like humans.
In the same week that Waymo launched driverless vehicles to the public — it felt like the conference was getting at something that we’ve all been feeling. Software is getting smarter and our interactions with it are becoming more intuitive.
And with that in mind, I’ll get into some highlights from the talks.
—
The AI Engineer World’s Fair was a three-day conference in SF filled with incredible speakers and amazing nerd energy. It would be impossible for me to recap all the different tracks and topics — so I thought I’d summarize three talks that stood out to me.
Picture of Jeff and me representing Chapter One at the conference!
State Space Models for Realtime Multimodal Intelligence: Karan Goel - CEO of Cartesia.ai
Speaker: Karan Goel
YouTube video: Keynotes & Multimodality track (time: 3 hours, 14 minutes)
Highlights:
He started the talk with the idea that for the past 3-4 years, AI has been focused on batch intelligence, but as we build AI systems that can reason for a long time on a problem and solve it — there will be a lot of applications where you need systems that are streaming.
The shift is towards real-time applications where you’re constantly querying a model, returning a model at low latency, and using that to generate new information. The use cases include: conversational interfaces, always-on assistants, world generation in games & VR, AI across devices, and robotics. And many of these applications will run on-device.
The team is building real-time foundation models for efficient, long-text, real-time models with state space model technology.
Key takeaway: The core idea is that you should be able to have a model that can compress information as it comes into the model and use that to build powerful systems that are streaming at their core. Their hypothesis is that you need new architectures that enable small, fast, and capable models—and transformers are inefficient at handling this. He argues that there is a need for new model architectures, and the new fundamentally efficient architectures will have compression at their core. State space models (SSMs) are a new architecture developed by their team, and the working intuition is that tokens stream in, update the model’s state, and are discarded — not keeping the memory around and compressing the state enables generation in linear time. SSMs that are starting to be adopted in the last few months: HiPPO, LSSL, S4, S4D, S4ND, Hyena, Mamba.
Compute & System Design for Next Generation Frontier Models: Dylan Patel - Chief Analyst, SemiAnalysis
Speaker: Dylan Patel
YouTube video: GPUs & Inference Track (time: 31 minutes)
Highlights:
The talk focused on inference and running frontier models. He argued that people have been talking about stagnation because the models that everyone is using today are the same as the ones trained in 2022. But he believes we’ll see the frontier grow a few more orders of magnitude in the near future.
He focused his talk on inference since most people in the room will be using the models rather than training them. In the context of GPT models, he broke down the inference requirements for pre-fill, decoding, continuous batching, and disaggregated batching.
Key takeaway: Google introduced context caching for Gemini a couple of weeks ago. He emphasized how big of a deal this is. The feature allows users to cache extensive and complex datasets (e.g. legal documents, medical records, long videos, images, audio files). It removes the need to retrieve content repeatedly whenever a question is asked — and significantly reduces price because the price for storing a context cache for a persistent amount of time is way lower than having to process large documents each time you want to ask a question.
Making of Devin by Cognition AI - Scott Wu, CEO, Cognition (Devin)
Speaker: Scott Wu
YouTube video: GPUs & Inference Track (time: 3 hours, 38 minutes)
This was one of the most highly anticipated talks given that Devin made such a huge splash when it was first introduced. It was awesome to see the demo live.
Highlights:
Devin is a fully autonomous software engineer.
He started the talk with a demo of Devin building a mobile-friendly website to play the name game for speakers at the World’s Fair — and then walked through giving the prompt and Devin building the app: making a plan, creating a new directory, building the code, reading the data, and deploying the first version.
Key takeaway: The team uses Devin to build Devin. The example he walked through was Devin building out a search bar to search within coding sessions. Devin has also created metrics tracking, etc. Scott spoke about how the first way of generative AI is text completion products but believes the second wave will be autonomous decision-makers (agents). He thinks coding agents are exciting for 3 reasons — SWE lends naturally to agentic workflows, the ability to iterate with code feedback is crucial, and model agentic abilities are rapidly improving.
Photos
Claude powering Minecraft
Hanging out with Sandra Kublik from Cohere and Christina Li from FPV Ventures
Scott Wu from Cognition presenting the Devin demo
I also attended HF0 demo day last week! This is a picture from one of the “Dreamflow” mirror screens. Below is me in anime form!
Photo of my first driverless Waymo ride!! The car perfectly slowed down as a biker whipped past us.
Thanks for reading!
Love this!!
Great read and love the insider view into the conference!