TL;DR
This paper demonstrates that fine-tuning deep learning models for table detection using datasets of document images significantly improves accuracy compared to models fine-tuned from natural images, emphasizing the importance of domain closeness.
Contribution
The study shows that employing close-domain fine-tuning from document image datasets enhances table detection accuracy over traditional natural image pre-training.
Findings
Fine-tuning from document image datasets improves accuracy by up to 60%.
Models trained on TableBank outperform those fine-tuned from natural images.
Close-domain fine-tuning is more effective for table detection in document images.
Abstract
A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table detection in document images employ deep learning algorithms; and, particularly, a technique known as fine-tuning. In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images. However, there is only a vague relation between natural and document images, and fine-tuning works better when there is a close relation between the source and target task. In this paper, we show that it is more beneficial to employ fine-tuning from a closer domain. To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRegion Proposal Network · Focal Loss · Softmax · RoIAlign · Feature Pyramid Network · Convolution · RetinaNet · Non Maximum Suppression · Mask R-CNN · 1x1 Convolution
