Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers
M\'elodie Boillet, Martin Maarand, Thierry Paquet, Christopher, Kermorvant

TL;DR
This paper demonstrates that incorporating keyword position information into image-based models significantly improves the segmentation of historical registers into meaningful units, such as acts, by combining visual and textual cues.
Contribution
It introduces a simple pipeline that enriches document images with text line positions and shows substantial performance gains in act detection accuracy.
Findings
Act detection mAP increased from 38% to 74% with textual info.
Using keyword positions enhances segmentation accuracy.
Combining visual and textual data improves historical document analysis.
Abstract
The segmentation of complex images into semantic regions has seen a growing interest these last years with the advent of Deep Learning. Until recently, most existing methods for Historical Document Analysis focused on the visual appearance of documents, ignoring the rich information that textual content can offer. However, the segmentation of complex documents into semantic regions is sometimes impossible relying only on visual features and recent models embed both visual and textual information. In this paper, we focus on the use of both visual and textual information for segmenting historical registers into structured and meaningful units such as acts. An act is a text recording containing valuable knowledge such as demographic information (baptism, marriage or death) or royal decisions (donation or pardon). We propose a simple pipeline to enrich document images with the position of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
