Class-Agnostic Region-of-Interest Matching in Document Images
Demin Zhang, Jiahao Lyu, Zhijie Shen, Yu Zhou

TL;DR
This paper introduces a flexible, class-agnostic approach for matching user-defined regions in document images, addressing limitations of fixed-category document analysis methods.
Contribution
It proposes a novel RoI-Matcher framework using a siamese network and cross-attention, and establishes a benchmark with evaluation metrics for this task.
Findings
Effective on the RoI-Matching-Bench benchmark
Serves as a baseline for future research
Addresses flexible, open-set region matching in documents
Abstract
Document understanding and analysis have received a lot of attention due to their widespread application. However, existing document analysis solutions, such as document layout analysis and key information extraction, are only suitable for fixed category definitions and granularities, and cannot achieve flexible applications customized by users. Therefore, this paper defines a new task named ``Class-Agnostic Region-of-Interest Matching'' (``RoI-Matching'' for short), which aims to match the customized regions in a flexible, efficient, multi-granularity, and open-set manner. The visual prompt of the reference document and target document images are fed into our model, while the output is the corresponding bounding boxes in the target document images. To meet the above requirements, we construct a benchmark RoI-Matching-Bench, which sets three levels of difficulties following real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Digital Media Forensic Detection
MethodsALIGN
