Read the Docs Before Rewriting: Equip Rewriter with Domain Knowledge via Continual Pre-training

Qi Wang; Yixuan Cao; Yifan Liu; Jiangtao Zhao; Ping Luo

arXiv:2507.00477·cs.IR·November 13, 2025

Read the Docs Before Rewriting: Equip Rewriter with Domain Knowledge via Continual Pre-training

Qi Wang, Yixuan Cao, Yifan Liu, Jiangtao Zhao, Ping Luo

PDF

Open Access

TL;DR

This paper introduces the R&R rewriter, which uses continual pre-training on domain-specific documents to improve query rewriting in RAG-based QA systems, especially in specialized fields.

Contribution

The paper proposes a novel continual pre-training approach for query rewriting that incorporates domain knowledge, enhancing RAG-based QA performance in specialized domains.

Findings

01

R&R improves domain-specific QA accuracy

02

Effective in multiple professional domains

03

Maintains good performance in general scenarios

Abstract

A Retrieval-Augmented Generation (RAG)-based question-answering (QA) system enhances a large language model's knowledge by retrieving relevant documents based on user queries. Discrepancies between user queries and document phrasings often necessitate query rewriting. However, in specialized domains, the rewriter model may struggle due to limited domain-specific knowledge. To resolve this, we propose the R\&R (Read the doc before Rewriting) rewriter, which involves continual pre-training on professional documents, akin to how students prepare for open-book exams by reviewing textbooks. Additionally, it can be combined with supervised fine-tuning for improved results. Experiments on multiple datasets demonstrate that R\&R excels in professional QA across multiple domains, effectively bridging the query-document gap, while maintaining good performance in general scenarios, thus advancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning