Episodios

  • Scaling Performance of Large Language Model Pretraining
    Sep 16 2025
    In this episode, we discuss Scaling Performance of Large Language Model Pretraining by Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, Chris Connelly, Albert Reuther. The paper explores the challenges and strategies involved in training large language models (LLMs) at scale, focusing on distributed training and managing massive datasets across many computing nodes. It provides practical recommendations for optimizing data parallelism to fully utilize GPU resources during pretraining. The goal is to offer clearer guidance on scaling LLM training pipelines, addressing a gap in publicly available information.
    Más Menos
    7 m
  • General Social Agents
    Sep 15 2025
    In this episode, we discuss General Social Agents by Benjamin S. Manning, John J. Horton. The paper proposes using AI agents guided by social science theory and natural language instructions to predict human behavior in novel settings without ad hoc adjustments. By training these agents on human data from related "seed" games, they successfully predict outcomes across a large and diverse set of new games. Their approach outperforms traditional game-theoretic predictions and existing AI models, even exceeding predictions based on published human data in some novel scenarios.
    Más Menos
    9 m
  • We need a new ethics for a world of AI agents
    Sep 12 2025

    In this episode, we discuss We need a new ethics for a world of AI agents by Iason Gabriel, Geoff Keeling, Arianna Manzini & James Evans. The paper examines the shift toward autonomous AI agents capable of goal-directed actions with minimal human oversight. It highlights both the potential benefits of these agents, such as economic growth and scientific advancement, and the associated risks involving responsibility, safety, and social dynamics. The authors call for increased collaboration among various stakeholders to address challenges and ensure beneficial human-agent and agent-agent interactions.

    Más Menos
    7 m
  • Hierarchical Reasoning Model
    Sep 11 2025
    In this episode, we discuss Hierarchical Reasoning Model by Guan Wang, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song, Yasin Abbasi Yadkori. The paper introduces the Hierarchical Reasoning Model (HRM), a recurrent architecture inspired by the brain's hierarchical processing that achieves deep, efficient reasoning in a single forward pass. HRM uses two interdependent modules for abstract planning and detailed computation, enabling it to excel on complex tasks like Sudoku and maze solving with minimal data and no pre-training. It outperforms larger models on the ARC benchmark, highlighting its promise for advancing general-purpose AI reasoning.
    Más Menos
    9 m
  • ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
    Sep 10 2025
    In this episode, we discuss ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts by Yuying Ge, Yixiao Ge, Chen Li, Teng Wang, Junfu Pu, Yizhuo Li, Lu Qiu, Jin Ma, Lisheng Duan, Xinyu Zuo, Jinwen Luo, Weibo Gu, Zexuan Li, Xiaojing Zhang, Yangyu Tao, Han Hu, Di Wang, Ying Shan. The paper presents ARC-Hunyuan-Video, a 7B-parameter multimodal model designed for detailed, temporally-structured understanding of short user-generated videos using visual, audio, and text inputs. It supports tasks like timestamped captioning, summarization, question answering, and video reasoning, trained through a multi-stage process including reinforcement learning. Evaluations show strong real-world performance, efficiency, and positive impact on user engagement in production deployment.
    Más Menos
    8 m
  • Small Language Models are the Future of Agentic AI
    Sep 9 2025
    In this episode, we discuss Small Language Models are the Future of Agentic AI by Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralidharan, Yingyan Celine Lin, Pavlo Molchanov. The paper argues that small language models (SLMs) are more suitable, powerful enough, and cost-effective for many specialized agentic AI tasks compared to large language models (LLMs). It proposes that heterogeneous agentic systems using multiple models are ideal when general conversational abilities are needed and presents an algorithm for converting LLM-based agents to SLM-based ones. The authors emphasize the economic and operational benefits of shifting towards SLMs and invite further discussion to advance affordable AI deployment.
    Más Menos
    8 m
  • Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
    Sep 8 2025
    In this episode, we discuss Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents by Davide Paglieri, Bartłomiej Cupiał, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel. The paper introduces a framework enabling large language model agents to dynamically decide when to plan during task execution, improving efficiency and performance. They propose a two-stage training process combining supervised fine-tuning and reinforcement learning to develop this capability. Experiments show these dynamically planning agents are more sample-efficient, achieve complex goals better, and can be guided by human plans.
    Más Menos
    7 m
  • Why Language Models Hallucinate
    Sep 7 2025
    In this episode, we discuss Why Language Models Hallucinate by The authors of the paper are: - Adam Tauman Kalai - Ofir Nachum - Santosh S. Vempala - Edwin Zhang. The paper explains that hallucinations in large language models arise because training and evaluation reward guessing over admitting uncertainty, framing the issue as errors in binary classification. It shows that models become incentivized to produce plausible but incorrect answers to perform well on benchmarks. The authors propose that addressing hallucinations requires changing how benchmarks are scored, promoting more trustworthy AI by discouraging penalization of uncertain responses.
    Más Menos
    8 m