• Arash Ahmadian on Rethinking RLHF

  • Mar 25 2024
  • Length: 34 mins
  • Podcast
Arash Ahmadian on Rethinking RLHF  By  cover art

Arash Ahmadian on Rethinking RLHF

  • Summary

  • Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

    Featured Reference

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker


    Additional References

    • Self-Rewarding Language Models, Yuan et al 2024
    • Reinforcement Learning: An Introduction, Sutton and Barto 1992
    • Learning from Delayed Rewards, Chris Watkins 1989
    • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
    Show more Show less

What listeners say about Arash Ahmadian on Rethinking RLHF

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.