SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents

Mehrdad Saberi; Keivan Rezaei; Soheil Feizi

arXiv:2605.21965·cs.CL·May 22, 2026

SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents

Mehrdad Saberi, Keivan Rezaei, Soheil Feizi

PDF

1 Repo

TL;DR

SpecHop is a framework that accelerates multi-hop retrieval tasks in language models by using continuous speculation with multiple threads, reducing latency while maintaining accuracy.

Contribution

It introduces a lossless speculation framework that asynchronously verifies predictions, enabling significant latency reduction without sacrificing correctness.

Findings

01

SpecHop reduces retrieval latency by up to 40%.

02

It closely matches theoretical latency gains predicted by the framework.

03

Empirical results validate the effectiveness of SpecHop on multi-hop retrieval tasks.

Abstract

Large language models increasingly use external tools such as web search and document retrieval to solve information-intensive tasks. However, multi-hop tool use in complex tasks introduces substantial latency, since the model must repeatedly wait for tool observations before continuing. We study how to accelerate such trajectories without changing the final trajectory the model would have taken without acceleration, assuming access to faster but less reliable speculator tools. We develop a theoretical framework for lossless speculation in multi-hop tool-use settings, characterizing the optimal achievable latency gain. We propose SpecHop, a continuous speculation framework that maintains multiple speculative threads, verifies predicted observations asynchronously as target tool outputs arrive, commits correct branches, and rolls back incorrect ones. This preserves accuracy while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mehrdadsaberi/spechop
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.