Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning
Vivswan Shah, Randy Cogill, Hanwei Yue, Gopinath Chennupati, Rinat Khaziev

TL;DR
This paper demonstrates that attaching explanations, even random or incoherent ones, during fine-tuning improves language model classification by acting as a regularizer and encouraging richer intermediate computation.
Contribution
It introduces a novel fine-tuning approach that uses explanations, including pseudo-explanations, to enhance model accuracy and interpretability in classification tasks.
Findings
Explanation-augmented training outperforms label-only baselines across datasets.
Pseudo-explanations improve accuracy despite lacking semantic content.
Explanation-based regularization increases model deliberation and reduces overconfidence.
Abstract
Fine-tuning LLMs for classification typically maps inputs directly to labels. We ask whether attaching brief explanations to each label during fine-tuning yields better models. We evaluate conversational response quality along three axes: naturalness, comprehensiveness, and on-topic adherence, each rated on 5-point scales. Using ensemble-generated data from multiple LLMs, we fine-tune a 7B-parameter model and test across six diverse conversational datasets. Across 18 dataset, task settings, label-plus-explanation training outperforms label-only baselines. A central and unexpected result concerns random tokens. We replace human-written explanations with text that is syntactically incoherent yet vocabulary-aligned with the originals (e.g., shuffled or bag-of-words variants). Despite lacking semantics, these pseudo-explanations still improve accuracy over label-only training and often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods
