Cross-Domain Document Object Detection: Benchmark Suite and Method
Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos, Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu

TL;DR
This paper introduces a benchmark suite and a novel cross-domain document object detection method that effectively handles domain shifts by aligning features, regions, and rendering layers, significantly improving detection performance across diverse document datasets.
Contribution
The paper establishes a comprehensive benchmark suite for cross-domain document object detection and proposes a new detection model with three alignment modules to address domain shifts.
Findings
The proposed method outperforms baseline models on the benchmark suite.
The three alignment modules significantly improve detection accuracy.
Extensive experiments validate the effectiveness of the approach.
Abstract
Decomposing images of document pages into high-level semantic regions (e.g., figures, tables, paragraphs), document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding. DOD remains a challenging problem as document objects vary significantly in layout, size, aspect ratio, texture, etc. An additional challenge arises in practice because large labeled training datasets are only available for domains that differ from the target domain. We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain. Documents from the two domains may vary significantly in layout, language, and genre. We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Cross-Domain Document Object Detection: Benchmark Suite and Method· youtube
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
