QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval

Hongming Tan; Shaoxiong Zhan; Hai Lin; Hai-Tao Zheng; Wai Kin Chan

arXiv:2407.20207·cs.CL·March 4, 2025

QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval

Hongming Tan, Shaoxiong Zhan, Hai Lin, Hai-Tao Zheng, Wai Kin Chan

PDF

Open Access

TL;DR

This paper introduces QAEA-DR, a text augmentation framework that enhances dense retrieval by transforming raw texts into information-rich formats using large language models, improving query-text matching without altering existing retrieval methods.

Contribution

The paper proposes a novel augmentation framework combining question-answer pairs and event extraction, with a scoring mechanism to improve dense retrieval performance.

Findings

01

Improved retrieval accuracy demonstrated in experiments.

02

Effective augmentation method without changing existing models.

03

Theoretical analysis supports the approach's validity.

Abstract

In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching. Additionally, low-quality texts with excessive noise or sparse key information are unlikely to align well with relevant queries. Recent studies mainly focus on improving the sentence embedding model or retrieval process. In this work, we introduce a novel text augmentation framework for dense retrieval. This framework transforms raw documents into information-dense text formats, which supplement the original texts to effectively address the aforementioned issues without modifying embedding or retrieval methodologies. Two text representations are generated via large language models (LLMs) zero-shot prompting: question-answer pairs and element-driven events. We term this approach QAEA-DR: unifying question-answer generation and event extraction in a text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsFocus · ALIGN