Loading paper
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers | Tomesphere