Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents

Ashley Lewis; Michael White; Jing Liu; Toshiaki Koike-Akino; Kieran Parsons; Ye Wang

arXiv:2502.19545·cs.CL·July 22, 2025

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents

Ashley Lewis, Michael White, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

PDF

Open Access

TL;DR

This paper compares knowledge distillation and self-training methods for reducing hallucination in product QA agents, demonstrating that synthetic data and self-training can achieve cost-effective, scalable improvements in model reliability.

Contribution

It introduces a retrieval-augmented QA pipeline and shows that self-training can match knowledge distillation in hallucination reduction, challenging assumptions about their relative effectiveness.

Findings

01

Synthetic data outperforms crowdsourced data in hallucination reduction.

02

Self-training achieves comparable hallucination mitigation to knowledge distillation.

03

Contextualized 'I don't know' responses improve robustness to unanswerable questions.

Abstract

The deployment of Large Language Models (LLMs) in customer support is constrained by hallucination (generating false information) and the high cost of proprietary models. To address these challenges, we propose a retrieval-augmented question-answering (QA) pipeline and explore how to balance human input and automation. Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models. We also compare self-training (fine-tuning models on their own outputs) and knowledge distillation (fine-tuning on stronger models' outputs, e.g., GPT-4o), and find that self-training achieves comparable hallucination reduction. We conjecture that this surprising finding can be attributed to increased exposure bias issues in the knowledge distillation case and support this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing

MethodsIs Venmo Customer Support Available 24/7? How to Reach a Real Person · Knowledge Distillation · High-Order Consensuses