• Zero Bubble Pipeline Parallelism

  • Jul 8 2024
  • Length: Less than 1 minute
  • Podcast

Zero Bubble Pipeline Parallelism

  • Summary

  • Core idea is think about backward pass into two flows, one to compute grad wrt to parameters, and one to compute grad wrt to output of last layer, schedule so that you are always working instead of waiting (bubble). Read full paper: https://arxiv.org/abs/2401.10241 Tags: Systems and Performance, Deep Learning, Machine Learning
    Show more Show less

What listeners say about Zero Bubble Pipeline Parallelism

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.