Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation

Shreyansh Padarha

arXiv:2507.00054·cs.AI·July 2, 2025

Enhancing Reasoning Capabilities in SLMs with Reward Guided Dataset Distillation

Shreyansh Padarha

PDF

Open Access

TL;DR

This paper introduces AdvDistill, a reward-guided dataset distillation method that improves small language models' reasoning abilities by leveraging multiple teacher responses and reward-based weighting, enhancing generalisability and performance on reasoning tasks.

Contribution

The paper proposes a novel reward-guided dataset distillation framework, AdvDistill, which enhances reasoning capabilities of small language models by using multiple responses and reward-based training.

Findings

01

Significant improvement in reasoning task performance

02

Effective use of reward-based weighting in distillation

03

Enhanced generalisability of small language models

Abstract

The push to compress and impart the proficiency of Large Language Models (LLMs) into more deployable and efficient Small Language Models (SLMs) has benefited from improvements in knowledge distillation (KD) techniques. These techniques allow a smaller student model to learn from a more capable and larger teacher model's responses. However, distillation often revolves around the student model merely copying the teacher's in-distribution responses, limiting its generalisability. This limitation is amplified on reasoning tasks and can be computationally expensive. In this study, we propose AdvDistill, a reward-guided dataset distillation framework. We utilise multiple generations (responses) from a teacher for each prompt and assign rewards based on rule-based verifiers. These varying and normally distributed rewards serve as weights when training student models. Our methods and their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)