Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation
Shreyansh Padarha

TL;DR
This paper introduces AdvDistill, a reward-guided dataset distillation method that improves small language models' reasoning abilities by leveraging multiple teacher responses and reward-based weighting, enhancing generalisability and performance on reasoning tasks.
Contribution
The paper proposes a novel reward-guided dataset distillation framework, AdvDistill, which enhances reasoning capabilities of small language models by using multiple responses and reward-based training.
Findings
Significant improvement in reasoning task performance
Effective use of reward-based weighting in distillation
Enhanced generalisability of small language models
Abstract
The push to compress and impart the proficiency of Large Language Models (LLMs) into more deployable and efficient Small Language Models (SLMs) has benefited from improvements in knowledge distillation (KD) techniques. These techniques allow a smaller student model to learn from a more capable and larger teacher model's responses. However, distillation often revolves around the student model merely copying the teacher's in-distribution responses, limiting its generalisability. This limitation is amplified on reasoning tasks and can be computationally expensive. In this study, we propose AdvDistill, a reward-guided dataset distillation framework. We utilise multiple generations (responses) from a teacher for each prompt and assign rewards based on rule-based verifiers. These varying and normally distributed rewards serve as weights when training student models. Our methods and their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
