• Arash Ahmadian on Rethinking RLHF

  • Mar 25 2024
  • Duración: 34 m
  • Podcast

Arash Ahmadian on Rethinking RLHF  Por  arte de portada

Arash Ahmadian on Rethinking RLHF

  • Resumen

  • Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

    Featured Reference

    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

    Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker


    Additional References

    • Self-Rewarding Language Models, Yuan et al 2024
    • Reinforcement Learning: An Introduction, Sutton and Barto 1992
    • Learning from Delayed Rewards, Chris Watkins 1989
    • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
    Más Menos
activate_primeday_promo_in_buybox_DT

Lo que los oyentes dicen sobre Arash Ahmadian on Rethinking RLHF

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.