For session 8 of the Eleuther AI ML Scalability & Performance reading group, I presented the Megatron-LM paper, which introduced tensor parallelism.

My annotated versions of these papers can be found be found on my Github here.

Papers:

  1. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Recording:

ML Scalability & Performance Reading Group Session 8: Megatron-LM