Loading paper
Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training | Tomesphere