Loading paper
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization | Tomesphere