HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

Peng Peng; Weiwei Lin; Wentai Wu; Xinyang Wang; Yongheng Liu

arXiv:2604.20452·cs.IR·April 23, 2026

HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

Peng Peng, Weiwei Lin, Wentai Wu, Xinyang Wang, Yongheng Liu

PDF

1 Repo

TL;DR

HaS is a homology-aware speculative retrieval framework that accelerates RAG by reducing latency with minimal accuracy loss, leveraging query homology for efficient document retrieval.

Contribution

It introduces a novel homology-aware speculative retrieval method that significantly speeds up RAG without compromising much accuracy.

Findings

01

Reduces retrieval latency by up to 37%.

02

Maintains 98-99% of original accuracy.

03

Effective in accelerating multi-hop queries.

Abstract

Retrieval-Augmented Generation (RAG) expands the knowledge boundary of large language models (LLMs) at inference by retrieving external documents as context. However, retrieval becomes increasingly time-consuming as the knowledge databases grow in size. Existing acceleration strategies either compromise accuracy through approximate retrieval, or achieve marginal gains by reusing results of strictly identical queries. We propose HaS, a homology-aware speculative retrieval framework that performs low-latency speculative retrieval over restricted scopes to obtain candidate documents, followed by validating whether they contain the required knowledge. The validation, grounded in the homology relation between queries, is formulated as a homologous query re-identification task: once a previously observed query is identified as a homologous re-encounter of the incoming query, the draft is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ErrEqualsNil/HaS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.