Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images
Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, and Muhammad Zeshan Afzal

TL;DR
This paper adapts and enhances the DETR transformer-based object detection model for graphical object detection in document images, achieving state-of-the-art results and demonstrating its effectiveness compared to traditional methods.
Contribution
The paper introduces modifications to the DETR model, including different query strategies and noise addition, to improve graphical object detection in document images.
Findings
Achieved state-of-the-art mAP scores on multiple datasets.
Transformer-based methods outperform traditional CNN-based approaches.
Query modifications improve robustness to object size and position variations.
Abstract
This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection. Existing graphical object detection approaches have enjoyed recent enhancements in CNN-based object detection methods, achieving remarkable progress. Recently, Transformer-based detectors have considerably boosted the generic object detection performance, eliminating the need for hand-crafted features or post-processing steps such as Non-Maximum Suppression (NMS) using object queries. However, the effectiveness of such enhanced transformer-based detection algorithms has yet to be verified for the problem of graphical object detection. Essentially, inspired by the latest advancements in the DETR, we employ the existing detection transformer with few modifications for graphical object detection. We modify object queries in different ways, using points, anchor boxes and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Adam · Byte Pair Encoding · Residual Connection
