Loading paper
Post-Trained MoE Can Skip Half Experts via Self-Distillation | Tomesphere