Class-Agnostic Region-of-Interest Matching in Document Images

Demin Zhang; Jiahao Lyu; Zhijie Shen; Yu Zhou

arXiv:2506.21055·cs.CV·June 27, 2025

Class-Agnostic Region-of-Interest Matching in Document Images

Demin Zhang, Jiahao Lyu, Zhijie Shen, Yu Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flexible, class-agnostic approach for matching user-defined regions in document images, addressing limitations of fixed-category document analysis methods.

Contribution

It proposes a novel RoI-Matcher framework using a siamese network and cross-attention, and establishes a benchmark with evaluation metrics for this task.

Findings

01

Effective on the RoI-Matching-Bench benchmark

02

Serves as a baseline for future research

03

Addresses flexible, open-set region matching in documents

Abstract

Document understanding and analysis have received a lot of attention due to their widespread application. However, existing document analysis solutions, such as document layout analysis and key information extraction, are only suitable for fixed category definitions and granularities, and cannot achieve flexible applications customized by users. Therefore, this paper defines a new task named ``Class-Agnostic Region-of-Interest Matching'' (``RoI-Matching'' for short), which aims to match the customized regions in a flexible, efficient, multi-granularity, and open-set manner. The visual prompt of the reference document and target document images are fed into our model, while the output is the corresponding bounding boxes in the target document images. To meet the above requirements, we construct a benchmark RoI-Matching-Bench, which sets three levels of difficulties following real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pd162/roi-matching
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Digital Media Forensic Detection

MethodsALIGN