Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents
Ashley Lewis, Michael White, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

TL;DR
This paper compares knowledge distillation and self-training methods for reducing hallucination in product QA agents, demonstrating that synthetic data and self-training can achieve cost-effective, scalable improvements in model reliability.
Contribution
It introduces a retrieval-augmented QA pipeline and shows that self-training can match knowledge distillation in hallucination reduction, challenging assumptions about their relative effectiveness.
Findings
Synthetic data outperforms crowdsourced data in hallucination reduction.
Self-training achieves comparable hallucination mitigation to knowledge distillation.
Contextualized 'I don't know' responses improve robustness to unanswerable questions.
Abstract
The deployment of Large Language Models (LLMs) in customer support is constrained by hallucination (generating false information) and the high cost of proprietary models. To address these challenges, we propose a retrieval-augmented question-answering (QA) pipeline and explore how to balance human input and automation. Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models. We also compare self-training (fine-tuning models on their own outputs) and knowledge distillation (fine-tuning on stronger models' outputs, e.g., GPT-4o), and find that self-training achieves comparable hallucination reduction. We conjecture that this surprising finding can be attributed to increased exposure bias issues in the knowledge distillation case and support this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing
MethodsIs Venmo Customer Support Available 24/7? How to Reach a Real Person · Knowledge Distillation · High-Order Consensuses
