When Babies Teach Babies: Can student knowledge sharing outperform   Teacher-Guided Distillation on small datasets?

Srikrishna Iyer

arXiv:2411.16487·cs.CL·November 26, 2024

When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Srikrishna Iyer

PDF

Open Access 1 Repo

TL;DR

This paper explores a teacher-less, student knowledge sharing approach for data-efficient language model pretraining, demonstrating it can match or outperform traditional teacher-guided methods on small datasets.

Contribution

Introduces a dynamic weighted mutual learning framework that eliminates the need for a teacher model, improving data efficiency in language model pretraining.

Findings

01

Teacher-less methods match or surpass teacher-supervised approaches.

02

Dynamic weighting improves knowledge distillation effectiveness.

03

Bi-level optimization enhances student diversity and performance.

Abstract

We present our submission to the BabyLM challenge, aiming to push the boundaries of data-efficient language model pretraining. Our method builds upon deep mutual learning, introducing a student model search for diverse initialization. We address the limitation of treating students equally by formulating weighted mutual learning as a bi-level optimization problem. The inner loop learns compact students through online distillation, while the outer loop optimizes weights for better knowledge distillation from diverse students. This dynamic weighting strategy eliminates the need for a teacher model, reducing computational requirements. Our evaluations show that teacher-less methods can match or surpass teacher-supervised approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-da-stc/generative-ai-research-babylm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Teaching and Learning Methods · Online and Blended Learning · Educational Assessment and Improvement

MethodsKnowledge Distillation