Episodes

  • Vincent Moens on TorchRL
    Apr 8 2024

    Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch.

    Featured References

    TorchRL: A data-driven decision-making library for PyTorch
    Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens


    Additional References

    • TorchRL on github
    • TensorDict Documentation


    Show more Show less
    40 mins
  • Arash Ahmadian on Rethinking RLHF
    Mar 25 2024

    Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

    Featured Reference

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker


    Additional References

    • Self-Rewarding Language Models, Yuan et al 2024
    • Reinforcement Learning: An Introduction, Sutton and Barto 1992
    • Learning from Delayed Rewards, Chris Watkins 1989
    • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
    Show more Show less
    34 mins
  • Glen Berseth on RL Conference
    Mar 11 2024

    Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).

    Featured Links

    Reinforcement Learning Conference

    Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View
    Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach

    Show more Show less
    22 mins
  • Ian Osband
    Mar 7 2024

    Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.

    We spoke about:

    - Information theory and RL

    - Exploration, epistemic uncertainty and joint predictions

    - Epistemic Neural Networks and scaling to LLMs


    Featured References

    Reinforcement Learning, Bit by Bit
    Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

    From Predictions to Decisions: The Importance of Joint Predictive Distributions

    Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

    Epistemic Neural Networks

    Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy


    Approximate Thompson Sampling via Epistemic Neural Networks

    Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy


    Additional References

    • Thesis defence, Ian Osband
    • Homepage, Ian Osband
    • Epistemic Neural Networks at Stanford RL Forum
    • Behaviour Suite for Reinforcement Learning, Osband et al 2019
    • Efficient Exploration for LLMs, Dwaracherla et al 2024
    Show more Show less
    1 hr and 8 mins
  • Sharath Chandra Raparthy
    Feb 12 2024

    Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!

    Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.


    Featured Reference

    Generalization to New Sequential Decision Making Tasks with In-Context Learning
    Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu

    Additional References

    • Sharath Chandra Raparthy Homepage
    • Human-Timescale Adaptation in an Open-Ended Task Space, Adaptive Agent Team 2023
    • Data Distributional Properties Drive Emergent In-Context Learning in Transformers, Chan et al 2022
    • Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al 2021


    Show more Show less
    41 mins
  • Pierluca D'Oro and Martin Klissarov
    Nov 13 2023

    Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!

    Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.


    Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.


    Featured References

    Motif: Intrinsic Motivation from Artificial Intelligence Feedback
    Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

    Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
    Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

    To keep doing RL research, stop calling yourself an RL researcher
    Pierluca D'Oro

    Show more Show less
    57 mins
  • Martin Riedmiller
    Aug 22 2023

    Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!


    Martin Riedmiller is a research scientist and team lead at DeepMind.


    Featured References


    Magnetic control of tokamak plasmas through deep reinforcement learning
    Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller


    Human-level control through deep reinforcement learning
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis

    Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method
    Martin Riedmiller

    Show more Show less
    1 hr and 14 mins
  • Max Schwarzer
    Aug 8 2023

    Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science. Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.

    Featured References

    Bigger, Better, Faster: Human-level Atari with human-level efficiency
    Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

    Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
    Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville

    The Primacy Bias in Deep Reinforcement Learning
    Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville


    Additional References

    • Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al 2017
    • When to use parametric models in reinforcement learning? Hasselt et al 2019
    • Data-Efficient Reinforcement Learning with Self-Predictive Representations, Schwarzer et al 2020
    • Pretraining Representations for Data-Efficient Reinforcement Learning, Schwarzer et al 2021



    Show more Show less
    1 hr and 10 mins