Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

Jiashuo Sun; Jimeng Shi; Yixuan Xie; Saizhuo Wang; Jash Rajesh Parekh; Pengcheng Jiang; Zhiyi Shi; Jiajun Fan; Qinglong Zheng; Peiran Li; Shaowen Wang; Ge Liu; Jiawei Han

arXiv:2605.12975·cs.AI·May 14, 2026

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

Jiashuo Sun, Jimeng Shi, Yixuan Xie, Saizhuo Wang, Jash Rajesh Parekh, Pengcheng Jiang, Zhiyi Shi, Jiajun Fan, Qinglong Zheng, Peiran Li, Shaowen Wang, Ge Liu, Jiawei Han

PDF

1 Repo

TL;DR

This paper introduces RAG, a framework that reformulates multi-hop retrieval-augmented question answering as executable Python programs, enabling more reliable reasoning, self-repair, and improved performance across multiple benchmarks.

Contribution

It proposes a novel program synthesis approach for multi-hop QA, replacing implicit reasoning with explicit, executable code that enhances interpretability and robustness.

Findings

01

RAG outperforms strong baselines on five QA benchmarks.

02

It achieves large gains on compositional multi-hop datasets.

03

The framework enables training-free and RL-trained improvements.

Abstract

Retrieval-Augmented Generation (RAG) has become a standard approach for knowledge-intensive question answering, but existing systems remain brittle on multi-hop questions, where solving the task requires chaining multiple retrieval and reasoning steps. Key challenges are that current methods represent reasoning through free-form natural language, where intermediate states are implicit, retrieval queries can drift from intended entities, and errors are detected by the same model that produces them making self-reflection an unreliable, ungrounded signal. We observe that multi-hop question answering is a typical form of step-by-step computation, and that this structured process aligns closely with how code-specialized language models are trained to operate. Motivated by this, we introduce \pyrag, a framework that reformulates multi-hop RAG as program synthesis and execution. Instead of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GasolSun36/PyRAG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.