Episodios

  • Satinder Singh: The Origin Story of RLDM @ RLDM 2025
    Jun 25 2025

    Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM. Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).

    Recorded on location at Trinity College Dublin, Ireland during RLDM 2025.

    Featured References

    RLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
    June 11-14, 2025 at Trinity College Dublin, Ireland

    Satinder Singh on Google Scholar

    Más Menos
    6 m
  • NeurIPS 2024 - Posters and Hallways 3
    Mar 9 2025

    Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada.

    Featuring

    • Claire Bizon Monroc from Inria: WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control
    • Andrew Wagenmaker from UC Berkeley: Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL
    • Harley Wiltzer from MILA: Foundations of Multivariate Distributional Reinforcement Learning
    • Vinzenz Thoma from ETH AI Center: Contextual Bilevel Reinforcement Learning for Incentive Alignment
    • Haozhe (Tony) Chen & Ang (Leon) Li from Columbia: QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers
    Más Menos
    10 m
  • NeurIPS 2024 - Posters and Hallways 2
    Mar 5 2025

    Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada.

    Featuring

    • Jonathan Cook from University of Oxford: Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning
    • Yifei Zhou from Berkeley AI Research: DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
    • Rory Young from University of Glasgow: Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
    • Glen Berseth from MILA: Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
    • Alexander Rutherford from University of Oxford: JaxMARL: Multi-Agent RL Environments and Algorithms in JAX

    Más Menos
    9 m
  • NeurIPS 2024 - Posters and Hallways 1
    Mar 3 2025

    Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada.

    Featuring

    • Jiaheng Hu of University of Texas: Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning
    • Skander Moalla of EPFL: No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
    • Adil Zouitine of IRT Saint Exupery/Hugging Face : Time-Constrained Robust MDPs
    • Soumyendu Sarkar of HP Labs : SustainDC: Benchmarking for Sustainable Data Center Control
    • Matteo Bettini of Cambridge University: BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
    • Michael Bowling of U Alberta : Beyond Optimism: Exploration With Partially Observable Rewards
    Más Menos
    10 m
  • Abhishek Naik on Continuing RL & Average Reward
    Feb 10 2025

    Abhishek Naik was a student at University of Alberta and Alberta Machine Intelligence Institute, and he just finished his PhD in reinforcement learning, working with Rich Sutton. Now he is a postdoc fellow at the National Research Council of Canada, where he does AI research on Space applications.

    Featured References

    Reinforcement Learning for Continuing Problems Using Average Reward
    Abhishek Naik Ph.D. dissertation 2024

    Reward Centering
    Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton 2024

    Learning and Planning in Average-Reward Markov Decision Processes
    Yi Wan, Abhishek Naik, Richard S. Sutton 2020

    Discounted Reinforcement Learning Is Not an Optimization Problem
    Abhishek Naik, Roshan Shariff, Niko Yasui, Hengshuai Yao, Richard S. Sutton 2019


    Additional References

    • Explaining dopamine through prediction errors and beyond, Gershman et al 2024 (proposes Differential-TD-like learning mechanism in the brain around Box 4)


    Más Menos
    1 h y 22 m
  • Neurips 2024 RL meetup Hot takes: What sucks about RL?
    Dec 23 2024

    What do RL researchers complain about after hours at the bar? In this "Hot takes" episode, we find out!

    Recorded at The Pearl in downtown Vancouver, during the RL meetup after a day of Neurips 2024.

    Special thanks to "David Beckham" for the inspiration :)

    Más Menos
    18 m
  • RLC 2024 - Posters and Hallways 5
    Sep 20 2024

    Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.

    Featuring:

    • 0:01 David Radke of the Chicago Blackhawks NHL on RL for professional sports
    • 0:56 Abhishek Naik from the National Research Council on Continuing RL and Average Reward
    • 2:42 Daphne Cornelisse from NYU on Autonomous Driving and Multi-Agent RL
    • 08:58 Shray Bansal from Georgia Tech on Cognitive Bias for Human AI Ad hoc Teamwork
    • 10:21 Claas Voelcker from University of Toronto on Can we hop in general?
    • 11:23 Brent Venable from The Institute for Human & Machine Cognition on Cooperative information dissemination


    Más Menos
    13 m
  • RLC 2024 - Posters and Hallways 4
    Sep 19 2024

    Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.

    Featuring:

    • 0:01 David Abel from DeepMind on 3 Dogmas of RL
    • 0:55 Kevin Wang from Brown on learning variable depth search for MCTS
    • 2:17 Ashwin Kumar from Washington University in St Louis on fairness in resource allocation
    • 3:36 Prabhat Nagarajan from UAlberta on Value overestimation
    Más Menos
    5 m