TL;DR
This paper introduces a two-stage knowledge distillation framework guided by cognitive uncertainty to improve student misconception classification, achieving high accuracy with limited data and small models.
Contribution
It proposes a novel dual-layer marginal selection mechanism based on cognitive uncertainty for effective sample mining in knowledge distillation.
Findings
Achieved 0.9585 MAP@3 on MAP-Charting dataset with only 10.30% filtered samples.
Attained 84.38% accuracy on middle school algebra misconceptions with a 4B model.
Significantly outperformed state-of-the-art LLM and fine-tuned models.
Abstract
Accurately identifying student misconceptions is crucial for personalized education but faces three challenges: (1) data scarcity with long-tail distribution, where authentic student reasoning is difficult to synthesize; (2) fuzzy boundaries between error categories with high annotation noise; (3) deployment parado-large models overlook unconventional approaches due to pretraining bias and cannot be deployed on edge, while small models overfit to noise. Unlike traditional methods that increase diversity through large-scale data synthesis, we propose a two-stage knowledge distillation framework that mines high-value samples from existing data. The first stage performs standard distillation to transfer task capabilities. The second stage introduces a dual-layer marginal selection mechanism based on cognitive uncertainty, identifying four types of critical samples based on teacher model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
