Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

Peiyang Liu; Ziqiang Cui; Xi Wang; Di Liang; Wei Ye

arXiv:2605.01284·cs.CV·May 5, 2026

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

Peiyang Liu, Ziqiang Cui, Xi Wang, Di Liang, Wei Ye

PDF

1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces Chain of Evidence (CoE), a visual attribution framework for iRAG that reasons directly over document images, providing pixel-level evidence visualization and outperforming text-based methods on visual-rich datasets.

Contribution

CoE is a retriever-agnostic visual attribution method that leverages Vision-Language Models to directly analyze document images, eliminating parsing bottlenecks and enhancing interpretability.

Findings

01

CoE outperforms text-based baselines on visual-rich datasets.

02

Fine-tuned Qwen3-VL-8B-Instruct achieves robust performance.

03

CoE provides pixel-level evidence visualization for reasoning.

Abstract

Iterative Retrieval-Augmented Generation (iRAG) has emerged as a powerful paradigm for answering complex multi-hop questions by progressively retrieving and reasoning over external documents. However, current systems predominantly operate on parsed text, which creates two critical bottlenecks: (1) \textit{Coarse-grained attribution}, where users are burdened with manually locating evidence within lengthy documents based on vague text-level citations; and (2) \textit{Visual semantic loss}, where the conversion of visually rich documents (e.g., slides, PDFs with charts) into text discards spatial logic and layout cues essential for reasoning. To bridge this gap, we present \textbf{Chain of Evidence (CoE)}, a retriever-agnostic visual attribution framework that leverages Vision-Language Models to reason directly over screenshots of retrieved document candidates. CoE eliminates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PeiYangLiu/CoE.git
github

Models

Datasets

PeiyangLiu/wiki-coe
dataset· 248 dl
248 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.