ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

Hao Yang; Yifan Ji; Zhipeng Xu; Zhenghao Liu; Yukun Yan; Zulong Chen; Shuo Wang; Yu Gu; Ge Yu

arXiv:2604.07419·cs.IR·April 10, 2026

ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

Hao Yang, Yifan Ji, Zhipeng Xu, Zhenghao Liu, Yukun Yan, Zulong Chen, Shuo Wang, Yu Gu, Ge Yu

PDF

1 Repo 2 Models 1 Datasets

TL;DR

ReAlign enhances visual document retrieval by leveraging reasoning-guided, fine-grained alignment using superior vision-language models to focus on crucial visual cues, improving retrieval accuracy across diverse datasets.

Contribution

The paper introduces ReAlign, a novel method that uses reasoning-guided supervision to improve the alignment of visual document representations with queries.

Findings

01

ReAlign achieves up to 2% relative improvement in retrieval performance.

02

The method generalizes across different VLM backbones.

03

ReAlign improves focus on critical visual cues for document representation.

Abstract

Visual document retrieval aims to retrieve a set of document pages relevant to a query from visually rich collections. Existing methods often employ Vision-Language Models (VLMs) to encode queries and visual pages into a shared embedding space, which is then optimized via contrastive training. However, during visual document representation, localized evidence is usually scattered across complex document layouts, making it difficult for retrieval models to capture crucial cues for effective embedding learning. In this paper, we propose Reasoning-Guided Alignment (ReAlign), a method that enhances visual document retrieval by leveraging the reasoning capability of VLMs to provide fine-grained visual document descriptions as supervision signals for training. Specifically, ReAlign employs a superior VLM to identify query-related regions on a page and then generates a query-aware description…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NEUIR/ReAlign
github

Models

Datasets

yanghaoir/ReAlign-Trainset
dataset· 151 dl
151 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.