Loading paper
Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty | Tomesphere