QAEncoder: Towards Aligned Representation Learning in Question Answering Systems

Zhengren Wang; Qinhan Yu; Shida Wei; Zhiyu Li; Feiyu Xiong; Xiaoxing Wang; Simin Niu; Hao Liang; Wentao Zhang

arXiv:2409.20434·cs.CL·July 3, 2025

QAEncoder: Towards Aligned Representation Learning in Question Answering Systems

Zhengren Wang, Qinhan Yu, Shida Wei, Zhiyu Li, Feiyu Xiong, Xiaoxing Wang, Simin Niu, Hao Liang, Wentao Zhang

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

QAEncoder is a training-free method that improves question-answering systems by aligning query and document embeddings, reducing retrieval gaps without extra training or storage.

Contribution

It introduces QAEncoder, a novel, training-free approach that enhances embedding alignment in QA systems, addressing the query-document gap effectively.

Findings

01

Effective across multiple datasets and languages

02

No additional storage or training required

03

Reduces hallucination and forgetting issues

Abstract

Modern QA systems entail retrieval-augmented generation (RAG) for accurate and trustworthy responses. However, the inherent gap between user queries and relevant documents hinders precise matching. We introduce QAEncoder, a training-free approach to bridge this gap. Specifically, QAEncoder estimates the expectation of potential queries in the embedding space as a robust surrogate for the document embedding, and attaches document fingerprints to effectively distinguish these embeddings. Extensive experiments across diverse datasets, languages, and embedding models confirmed QAEncoder's alignment capability, which offers a simple-yet-effective solution with zero additional index storage, retrieval latency, training costs, or catastrophic forgetting and hallucination issues. The repository is publicly available at https://github.com/IAAR-Shanghai/QAEncoder.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IAAR-Shanghai/QAEncoder
noneOfficial

Datasets

zr-wang/FIGNEWS_generated_queries
dataset· 1.7k dl
1.7k dl

Videos

QAEncoder: Towards Aligned Representation Learning in Question Answering Systems· underline

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems

MethodsAttention Is All You Need · Attention Dropout · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Weight Decay · Byte Pair Encoding · BERT · Softmax · Dropout