Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering
Bolei He, Xinran He, Run Shao, Shanfu Shu, Xianwei Xue, Mingquan Cheng, Haifeng Li, Zhenhua Ling

TL;DR
This paper introduces Select2Know, a cost-effective framework for domain-specific question answering that intelligently combines internal and external knowledge, improving accuracy while reducing costs.
Contribution
The paper proposes a novel internal-external knowledge self-selection framework with structured reasoning data generation and selective fine-tuning for domain-specific QA.
Findings
S2K outperforms existing methods on medical, legal, and financial QA benchmarks.
S2K matches domain-pretrained LLMs with significantly lower cost.
The framework effectively utilizes long-tail domain knowledge.
Abstract
Large Language Models (LLMs) perform well in general QA but often struggle in domain-specific scenarios. Retrieval-Augmented Generation (RAG) introduces external knowledge but suffers from hallucinations and latency due to noisy retrievals. Continued pretraining internalizes domain knowledge but is costly and lacks cross-domain flexibility. We attribute this challenge to the long-tail distribution of domain knowledge, which leaves partial yet useful internal knowledge underutilized. We further argue that knowledge acquisition should be progressive, mirroring human learning: first understanding concepts, then applying them to complex reasoning. To address this, we propose Selct2Know (S2K), a cost-effective framework that internalizes domain knowledge through an internal-external knowledge self-selection strategy and selective supervised fine-tuning. We also introduce a structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
