Episodios

  • The Markovian Thinker
    Oct 16 2025
    In this episode, we discuss The Markovian Thinker by Milad Aghajohari, Kamran Chitsaz, Amirhossein Kazemnejad, Sarath Chandar, Alessandro Sordoni, Aaron Courville, Siva Reddy. The paper proposes Markovian Thinking, a reinforcement learning paradigm that limits reasoning context to a constant-size state, enabling linear compute with constant memory rather than quadratic overhead. They implement this approach in Delethink, an environment that segments reasoning into fixed-size chunks with learned textual states to seamlessly continue reasoning after resets. Experiments show Delethink-trained models achieve longer reasoning chains more efficiently and scale better than standard methods, significantly reducing computational costs.
    Más Menos
    8 m
  • DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
    Oct 8 2025
    In this episode, we discuss DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL by Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong. The paper introduces DeepDive, a method to improve large language models' deep search capabilities by automatically generating complex questions and applying multi-turn reinforcement learning for enhanced long-horizon reasoning. DeepDive-32B outperforms existing open-source models on browsing benchmarks like BrowseComp. The approach also enables scalable tool usage during inference, with all resources made publicly available.
    Más Menos
    8 m
  • Towards a Physics Foundation Model
    Oct 3 2025
    In this episode, we discuss Towards a Physics Foundation Model by Florian Wiesner, Matthias Wessling, Stephen Baek. This paper introduces the General Physics Transformer (GPhyT), a foundation model trained on diverse simulation data that can simulate multiple complex physical systems without explicit knowledge of governing equations. GPhyT outperforms specialized models by up to 29 times, generalizes zero-shot to unseen physics tasks, and maintains stable predictions over long time horizons. This work demonstrates the feasibility of a universal physics foundation model, potentially revolutionizing computational science by eliminating the need for task-specific solvers.
    Más Menos
    7 m
  • Scalable Option Learning in High-Throughput Environments
    Sep 30 2025
    In this episode, we discuss Scalable Option Learning in High-Throughput Environments by Mikael Henaff, Scott Fujimoto, Michael Rabbat. The paper presents Scalable Option Learning (SOL), a hierarchical reinforcement learning algorithm designed for high-throughput environments. SOL achieves a 25x increase in training speed and outperforms flat agents by training on 20 billion frames in the game NetHack. The method is also validated on MiniHack and Mujoco, demonstrating broad applicability and scalability.
    Más Menos
    8 m
  • Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
    Sep 24 2025
    In this episode, we discuss Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning by Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin. This paper investigates Reinforcement Learning with Verifiable Rewards (RLVR) by analyzing token entropy patterns during Chain-of-Thought reasoning in Large Language Models. It finds that a small subset of high-entropy "forking" tokens critically guide reasoning pathways and that RLVR primarily adjusts these tokens to improve performance. Leveraging this insight, the authors enhance RLVR efficiency by focusing updates on these tokens, achieving better results with fewer token updates across multiple model scales.
    Más Menos
    8 m
  • Reverse-Engineered Reasoning for Open-Ended Generation
    Sep 19 2025
    In this episode, we discuss Reverse-Engineered Reasoning for Open-Ended Generation by Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Wei Ye, Tong Yang, Wenhao Huang, Ge Zhang, Fangzhen Lin. The paper introduces REverse-Engineered Reasoning (REER), a novel backward approach that uncovers deep reasoning steps from known good solutions instead of forward trial-and-error or imitation. Using REER, the authors create DeepWriting-20K, a large dataset of reasoning trajectories for open-ended tasks, and train DeepWriter-8B, a model that outperforms strong open-source baselines. DeepWriter-8B also matches or exceeds the performance of leading proprietary models like GPT-4o and Claude 3.5.
    Más Menos
    9 m
  • Scaling Performance of Large Language Model Pretraining
    Sep 16 2025
    In this episode, we discuss Scaling Performance of Large Language Model Pretraining by Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, Chris Connelly, Albert Reuther. The paper explores the challenges and strategies involved in training large language models (LLMs) at scale, focusing on distributed training and managing massive datasets across many computing nodes. It provides practical recommendations for optimizing data parallelism to fully utilize GPU resources during pretraining. The goal is to offer clearer guidance on scaling LLM training pipelines, addressing a gap in publicly available information.
    Más Menos
    7 m
  • General Social Agents
    Sep 15 2025
    In this episode, we discuss General Social Agents by Benjamin S. Manning, John J. Horton. The paper proposes using AI agents guided by social science theory and natural language instructions to predict human behavior in novel settings without ad hoc adjustments. By training these agents on human data from related "seed" games, they successfully predict outcomes across a large and diverse set of new games. Their approach outperforms traditional game-theoretic predictions and existing AI models, even exceeding predictions based on published human data in some novel scenarios.
    Más Menos
    9 m