Episodes

  • Klarna CEO Sebastian Siemiatkowski on Getting AI to Do the Work of 700 Customer Service Reps
    Jul 23 2024
    In February, Sebastian Siemiatkowski boldly announced that Klarna’s new OpenAI-powered assistant handled two thirds of the Swedish fintech’s customer service chats in its first month. Not only were customer satisfaction metrics better, but by replacing 700 full-time contractors the bottom line impact is projected to be $40M. Since then, every company we talk to wants to know, “How do we get the Klarna customer support thing?” Co-founder and CEO Sebastian Siemiatkowski tells us how the Klarna team shipped this new product in record time—and how embracing AI internally with an experimental mindset is transforming the company. He discusses how AI development is proliferating inside the company, from customer support to marketing to internal knowledge to customer-facing experiences. Sebastian also reflects on the impacts of AI on employment, society, and the arts while encouraging lawmakers to be open minded about the benefits. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned in this episode: DeepL: Language translation app that Sebastian says makes 10,000 translators in Brussels redundant The Klarna brand: The offbeat optimism that the company is now augmenting with AI Neo4j: The graph database management system that Klarna is using to build Kiki, their internal knowledge base 00:00 Introduction 01:57 Klarna’s business 03:00 Pitching OpenAI 08:51 How we built this 10:46 Will Klara ever completely replace its CS team with AI? 14:22 The benefits 17:25 If you had a policy magic wand… 21:12 What jobs will be most affected by AI? 23:58 How about marketing? 27:55 How creative are LLMs? 30:11 Klarna’s knowledge graph, Kiki 33:10 Reducing the number of enterprise systems 35:24 Build vs buy? 39:59 What’s next for Klarna with AI? 48:48 Lightning round
    Show more Show less
    52 mins
  • Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs
    Jul 16 2024
    LLMs are democratizing digital intelligence, but we’re all waiting for AI agents to take this to the next level by planning tasks and executing actions to actually transform the way we work and live our lives. Yet despite incredible hype around AI agents, we’re still far from that “tipping point” with best in class models today. As one measure: coding agents are now scoring in the high-teens % on the SWE-bench benchmark for resolving GitHub issues, which far exceeds the previous unassisted baseline of 2% and the assisted baseline of 5%, but we’ve still got a long way to go. Why is that? What do we need to truly unlock agentic capability for LLMs? What can we learn from researchers who have built both the most powerful agents in the world, like AlphaGo, and the most powerful LLMs in the world? To find out, we’re talking to Misha Laskin, former research scientist at DeepMind. Misha is embarking on his vision to build the best agent models by bringing the search capabilities of RL together with LLMs at his new company, Reflection AI. He and his cofounder Ioannis Antonoglou, co-creator of AlphaGo and AlphaZero and RLHF lead for Gemini, are leveraging their unique insights to train the most reliable models for developers building agentic workflows. Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital 00:00 Introduction 01:11 Leaving Russia, discovering science 10:01 Getting into AI with Ioannis Antonoglou 15:54 Reflection AI and agents 25:41 The current state of Ai agents 29:17 AlphaGo, AlphaZero and Gemini 32:58 LLMs don’t have a ground truth reward 37:53 The importance of post-training 44:12 Task categories for agents 45:54 Attracting talent 50:52 How far away are capable agents? 56:01 Lightning round Mentioned: The Feynman Lectures on Physics: The classic text that got Misha interested in science. Mastering the game of Go with deep neural networks and tree search: The original 2016 AlphaGo paper. Mastering the game of Go without human knowledge: 2017 AlphaGo Zero paper Scaling Laws for Reward Model Overoptimization: OpenAI paper on how reward models can be gamed at all scales for all algorithms. Mapping the Mind of a Large Language Model: Article about Anthropic mechanistic interpretability paper that identifies how millions of concepts are represented inside Claude Sonnet Pieter Abeel: Berkeley professor and founder of Covariant who Misha studied with A2C and A3C: Advantage Actor Critic and Asynchronous Advantage Actor Critic, the two algorithms developed by Misha’s manager at DeepMind, Volodymyr Mnih, that defined reinforcement learning and deep reinforcement learning
    Show more Show less
    1 hr and 7 mins
  • Microsoft CTO Kevin Scott on How Far Scaling Laws Will Extend
    Jul 9 2024
    The current LLM era is the result of scaling the size of models in successive waves (and the compute to train them). It is also the result of better-than-Moore’s-Law price vs performance ratios in each new generation of Nvidia GPUs. The largest platform companies are continuing to invest in scaling as the prime driver of AI innovation. Are they right, or will marginal returns level off soon, leaving hyperscalers with too much hardware and too few customer use cases? To find out, we talk to Microsoft CTO Kevin Scott who has led their AI strategy for the past seven years. Scott describes himself as a “short-term pessimist, long-term optimist” and he sees the scaling trend as durable for the industry and critical for the establishment of Microsoft’s AI platform. Scott believes there will be a shift across the compute ecosystem from training to inference as the frontier models continue to improve, serving wider and more reliable use cases. He also discusses the coming business models for training data, and even what ad units might look like for autonomous agents. Hosted by: Pat Grady and Bill Coughran, Sequoia Capital Mentioned: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, the 2018 Google paper that convinced Kevin that Microsoft wasn’t moving fast enough on AI. Dennard scaling: The scaling law that describes the proportional relationship between transistor size and power use; has not held since 2012 and is often confused with Moore’s Law. Textbooks Are All You Need: Microsoft paper that introduces a new large language model for code, phi-1, that achieves smaller size by using higher quality “textbook” data. GPQA and MMLU: Benchmarks for reasoning Copilot: Microsoft product line of GPT consumer assistants from general productivity to design, vacation planning, cooking and fitness. Devin: Autonomous AI code agent from Cognition Labs that Microsoft recently announced a partnership with. Ray Solomonoff: Participant in the 1956 Dartmouth Summer Research Project on Artificial Intelligence that named the field; Kevin admires his prescience about the importance of probabilistic methods decades before anyone else. 00:00 - Introduction 01:20 - Kevin’s backstory 06:56 - The role of PhDs in AI engineering 09:56 - Microsoft’s AI strategy 12:40 - Highlights and lowlights 16:28 - Accelerating investments 18:38 - The OpenAI partnership 22:46 - Soon inference will dwarf training 27:56 - Will the demand/supply balance change? 30:51 - Business models for data 36:54 - The value function 39:58 - Copilots 44:47 - The 98/2 rule 49:34 - Solving zero-sum games 57:13 - Lightning round
    Show more Show less
    1 hr
  • Zapier’s Mike Knoop launches ARC Prize to Jumpstart New Ideas for AGI
    Jul 2 2024
    As impressive as LLMs are, the growing consensus is that language, scale and compute won’t get us to AGI. Although many AI benchmarks have quickly achieved human-level performance, there is one eval that has barely budged since it was created in 2019. Google researcher François Chollet wrote a paper that year defining intelligence as skill-acquisition efficiency—the ability to learn new skills as humans do, from a small number of examples. To make it testable he proposed a new benchmark, the Abstraction and Reasoning Corpus (ARC), designed to be easy for humans, but hard for AI. Notably, it doesn’t rely on language. Zapier co-founder Mike Knoop read Chollet’s paper as the LLM wave was rising. He worked quickly to integrate generative AI into Zapier’s product, but kept coming back to the lack of progress on the ARC benchmark. In June, Knoop and Chollet launched the ARC Prize, a public competition offering more than $1M to beat and open-source a solution to the ARC-AGI eval. In this episode Mike talks about the new ideas required to solve ARC, shares updates from the first two weeks of the competition, and shares why he’s excited for AGI systems that can innovate alongside humans. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models: The 2019 paper that first caught Mike’s attention about the capabilities of LLMs On the Measure of Intelligence: 2019 paper by Google researcher François Chollet that introduced the ARC benchmark, which remains unbeaten ARC Prize 2024: The $1M+ competition Mike and François have launched to drive interest in solving the ARC-AGI eval Sequence to Sequence Learning with Neural Networks: Ilya Sutskever paper from 2014 that influenced the direction of machine translation with deep neural networks. Etched: Luke Miles on LessWrong wrote about the first ASIC chip that accelerates transformers on silicon Kaggle: The leading data science competition platform and online community, acquired by Google in 2017 Lab42: Swiss AU lab that hosted ARCathon precursor to ARC Prize Jack Cole: Researcher on team that was #1 on the leaderboard for ARCathon Ryan Greenblatt: Researcher with current high score (50%) on ARC public leaderboard (00:00) Introduction (01:51) AI at Zapier (08:31) What is ARC AGI? (13:25) What does it mean to efficiently acquire a new skill? (19:03) What approaches will succeed? (21:11) A little bit of a different shape (25:59) The role of code generation and program synthesis (29:11) What types of people are working on this? (31:45) Trying to prove you wrong (34:50) Where are the big labs? (38:21) The world post-AGI (42:51) When will we cross 85% on ARC AGI? (46:12) Will LLMs be part of the solution? (50:13) Lightning round
    Show more Show less
    55 mins
  • Factory’s Matan Grinberg and Eno Reyes Unleash the Droids on Software Development
    Jun 25 2024
    Archimedes said that with a large enough lever, you can move the world. For decades, software engineering has been that lever. And now, AI is compounding that lever. How will we use AI to apply 100 or 1000x leverage to the greatest lever to move the world? Matan Grinberg and Eno Reyes, co-founders of Factory, have chosen to do things differently than many of their peers in this white-hot space. They sell a fleet of “Droids,” purpose-built dev agents which accomplish different tasks in the software development lifecycle (like code review, testing, pull requests or writing code). Rather than training their own foundation model, their approach is to build something useful for engineering orgs today on top of the rapidly improving models, aligning with the developer and evolving with them. Matan and Eno are optimistic about the effects of autonomy in software development and on building a company in the application layer. Their advice to founders, “The only way you can win is by executing faster and being more obsessed.” Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned: Juan Maldacena, Institute for Advanced Study, string theorist that Matan cold called as an undergrad SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, small-model open-source software engineering agent SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, an evaluation framework for GitHub issues Monte Carlo tree search, a 2006 algorithm for solving decision making in games (and used in AlphaGo) Language agent tree search, a framework for LLM planning, acting and reasoning The Bitter Lesson, Rich Sutton’s essay on scaling in search and learning Code churn, time to merge, cycle time, metrics Factory thinks are important to eng orgs Transcript: https://www.sequoiacap.com/podcast/training-data-factory/ 00:00 Introduction 01:36 Personal backgrounds 10:54 The compound lever 12:41 What is Factory? 16:29 Cognitive architectures 21:13 800 engineers at OpenAI are working on my margins 24:00 Jeff Dean doesn't understand your code base 25:40 Individual dev productivity vs system-wide optimization 30:04 Results: Factory in action 32:54 Learnings along the way 35:36 Fully autonomous Jeff Deans 37:56 Beacons of the upcoming age 40:04 How far are we? 43:02 Competition 45:32 Lightning round 49:34 Bonus round: Factory's SWE-bench results
    Show more Show less
    59 mins
  • LangChain’s Harrison Chase on Building the Orchestration Layer for AI Agents
    Jun 18 2024
    Last year, AutoGPT and Baby AGI captured our imaginations—agents quickly became the buzzword of the day…and then things went quiet. AutoGPT and Baby AGI may have marked a peak in the hype cycle, but this year has seen a wave of agentic breakouts on the product side, from Klarna’s customer support AI to Cognition’s Devin, etc. Harrison Chase of LangChain is focused on enabling the orchestration layer for agents. In this conversation, he explains what’s changed that’s allowing agents to improve performance and find traction. Harrison shares what he’s optimistic about, where he sees promise for agents vs. what he thinks will be trained into models themselves, and discusses novel kinds of UX that he imagines might transform how we experience agents in the future. Hosted by: Sonya Huang and Pat Grady, Sequoia Capital Mentioned: ReAct: Synergizing Reasoning and Acting in Language Models, the first cognitive architecture for agents SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, small-model open-source software engineering agent from researchers at Princeton Devin, autonomous software engineering from Cognition V0: Generative UI agent from Vercel GPT Researcher, a research agent Language Model Cascades: 2022 paper by Google Brain and now OpenAI researcher David Dohan that was influential for Harrison in developing LangChain Transcript: https://www.sequoiacap.com/podcast/training-data-harrison-chase/ 00:00 Introduction 01:21 What are agents? 05:00 What is LangChain’s role in the agent ecosystem? 11:13 What is a cognitive architecture? 13:20 Is bespoke and hard coded the way the world is going, or a stop gap? 18:48 Focus on what makes your beer taste better 20:37 So what? 22:20 Where are agents getting traction? 25:35 Reflection, chain of thought, other techniques? 30:42 UX can influence the effectiveness of the architecture 35:30 What’s out of scope? 38:04 Fine tuning vs prompting? 42:17 Existing observability tools for LLMs vs needing a new architecture/approach 45:38 Lightning round
    Show more Show less
    50 mins
  • Introducing "Training Data"
    Jun 5 2024
    Join us as we train our neural nets on the theme of the century: AI. Sequoia Capital partners Sonya Huang and Pat Grady host conversations with leading AI builders and researchers to ask critical questions and develop a deeper understanding of the evolving technologies and their implications for technology, business and society. The content of this podcast does not constitute investment advice, an offer to provide investment advisory services, or an offer to sell or solicitation of an offer to buy an interest in any investment fund.
    Show more Show less
    1 min