Task-Specific Knowledge Distillation via Intermediate Probes

Ryan Brown; Chris Russell

arXiv:2603.12270·cs.CL·March 16, 2026

Task-Specific Knowledge Distillation via Intermediate Probes

Ryan Brown, Chris Russell

PDF

Open Access

TL;DR

This paper introduces extsc{ProbeDistill}, a novel knowledge distillation method that uses intermediate probes on frozen teacher models to provide cleaner supervision signals, improving reasoning tasks especially with limited data.

Contribution

It proposes a simple, architecture-agnostic distillation framework that leverages intermediate representations via probes, bypassing noisy output distributions and enhancing performance on reasoning benchmarks.

Findings

01

Consistent improvements on reasoning benchmarks with limited data.

02

Probes provide cleaner, denoised supervision signals.

03

Method requires no architectural changes and is computationally efficient.

Abstract

Knowledge distillation from large language models (LLMs) assumes that the teacher's output distribution is a high-quality training signal. On reasoning tasks, this assumption is frequently violated. A model's intermediate representations may encode the correct answer, yet this information is lost or distorted through the vocabulary projection, where prompt formatting and answer-token choices creates brittle, noisy outputs. We introduce \method{}, a distillation framework that bypasses this bottleneck by training lightweight probes on frozen teacher hidden states and using the probe's predictions, rather than output logits, as supervision for student training. This simple change yields consistent improvements across four reasoning benchmarks (AQuA-RAT, ARC Easy/Challenge, and MMLU), with gains most pronounced under limited data. Probes trained on intermediate representations provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications