Loading paper
A Study on Hidden Layer Distillation for Large Language Model Pre-Training | Tomesphere