Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs
Jul 16 2024
Length: 1 hr and 7 mins
Podcast

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to Cart failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Please try again

Unfollow podcast failed

Please try again

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Listen for free

View show details

Summary
LLMs are democratizing digital intelligence, but we’re all waiting for AI agents to take this to the next level by planning tasks and executing actions to actually transform the way we work and live our lives. Yet despite incredible hype around AI agents, we’re still far from that “tipping point” with best in class models today. As one measure: coding agents are now scoring in the high-teens % on the SWE-bench benchmark for resolving GitHub issues, which far exceeds the previous unassisted baseline of 2% and the assisted baseline of 5%, but we’ve still got a long way to go. Why is that? What do we need to truly unlock agentic capability for LLMs? What can we learn from researchers who have built both the most powerful agents in the world, like AlphaGo, and the most powerful LLMs in the world? To find out, we’re talking to Misha Laskin, former research scientist at DeepMind. Misha is embarking on his vision to build the best agent models by bringing the search capabilities of RL together with LLMs at his new company, Reflection AI. He and his cofounder Ioannis Antonoglou, co-creator of AlphaGo and AlphaZero and RLHF lead for Gemini, are leveraging their unique insights to train the most reliable models for developers building agentic workflows. Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital 00:00 Introduction 01:11 Leaving Russia, discovering science 10:01 Getting into AI with Ioannis Antonoglou 15:54 Reflection AI and agents 25:41 The current state of Ai agents 29:17 AlphaGo, AlphaZero and Gemini 32:58 LLMs don’t have a ground truth reward 37:53 The importance of post-training 44:12 Task categories for agents 45:54 Attracting talent 50:52 How far away are capable agents? 56:01 Lightning round Mentioned: The Feynman Lectures on Physics: The classic text that got Misha interested in science. Mastering the game of Go with deep neural networks and tree search: The original 2016 AlphaGo paper. Mastering the game of Go without human knowledge: 2017 AlphaGo Zero paper Scaling Laws for Reward Model Overoptimization: OpenAI paper on how reward models can be gamed at all scales for all algorithms. Mapping the Mind of a Large Language Model: Article about Anthropic mechanistic interpretability paper that identifies how millions of concepts are represented inside Claude Sonnet Pieter Abeel: Berkeley professor and founder of Covariant who Misha studied with A2C and A3C: Advantage Actor Critic and Asynchronous Advantage Actor Critic, the two algorithms developed by Misha’s manager at DeepMind, Volodymyr Mnih, that defined reinforcement learning and deep reinforcement learning

Show more Show less

Economics

Show more Show less

What listeners say about Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.

Audible.com reviews

Amazon reviews

No Reviews are Available

Report a review on Amazon

Get Started

Popular Lists

Explore Audible

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Failed to add items

Add to Cart failed.

Add to Wish List failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Summary

What listeners say about Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Reviews - Please select the tabs below to change the source of reviews.

Audible.com reviews

Amazon reviews