Domain Specific Specialization in Low-Resource Settings: The Efficacy of Offline Response-Based Knowledge Distillation in Large Language Models
Erdem Aslan, Pakize Erdo\u{g}mu\c{s}

TL;DR
This paper introduces an offline response-based knowledge distillation method to develop domain-specific large language model assistants efficiently in low-resource environments, emphasizing data quality over quantity.
Contribution
It proposes a novel distillation approach using a small, context-aware synthetic dataset and demonstrates its effectiveness in reducing hallucinations and improving accuracy.
Findings
500-line context-aware dataset achieves 96.7% accuracy
Larger unstructured datasets do not significantly reduce hallucinations
Data quality and structural alignment are crucial for domain adaptation
Abstract
Large Language Models (LLMs) excel in general tasks but often struggle with hallucinations when handling domain-specific or institutional knowledge absent from their pre-training. We present an offline response-based knowledge distillation method that develops high-accuracy specialized assistants under constrained hardware resources. We evaluate three distinct data strategies: general domain adaptation (15,000 lines), unstructured knowledge injection (2,000 lines), and a context-aware synthetic dataset (500 lines) generated by a teacher model. To minimize computational costs, we utilize the Unsloth library to optimize the Qwen-2.5-7B student model, reducing NVIDIA A100 GPU memory requirements from 40 GB to 16 GB. Experimental results demonstrate that while larger unstructured datasets suffer from persistent hallucinations, the 500-line context-aware dataset achieves a 96.7% accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
