Loading paper
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers | Tomesphere