Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in   BabyLlama with Reverse KL Divergence

Shaozhen Shi; Yevgen Matusevych; Malvina Nissim

arXiv:2410.22081·cs.CL·October 30, 2024

Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence

Shaozhen Shi, Yevgen Matusevych, Malvina Nissim

PDF

Open Access

TL;DR

This paper introduces a mode-seeking distillation method using reverse KL divergence in a teacher-student setup, showing improved performance with a single teacher and advanced optimization in the BabyLM challenge.

Contribution

We propose using reverse KL divergence for mode-seeking behavior in distillation, demonstrating its effectiveness with a single teacher and optimization strategies.

Findings

01

Single-teacher models often outperform or match multi-teacher models.

02

Reverse KL divergence enhances mode-seeking behavior.

03

Optimization techniques improve distillation performance.

Abstract

This study presents our submission to the Strict-Small Track of the 2nd BabyLM Challenge. We use a teacher-student distillation setup with the BabyLLaMa model (Timiryasov and Tastet, 2023) as a backbone. To make the student's learning process more focused, we replace the objective function with a reverse Kullback-Leibler divergence, known to cause mode-seeking (rather than mode-averaging) behaviour in computational learners. We further experiment with having a single teacher (instead of an ensemble of two teachers) and implement additional optimization strategies to improve the distillation process. Our experiments show that under reverse KL divergence, a single-teacher model often outperforms or matches multiple-teacher models across most tasks. Additionally, incorporating advanced optimization techniques further enhances model performance, demonstrating the effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoybean genetics and cultivation