A document is worth a structured record: Principled inductive bias design for document recognition
Benjamin Meyer, Lukas Tuggener, Sascha H\"anzi, Daniel Schmid, Erdal Ayfer, Benjamin F. Grewe, Ahmed Abdulkadir, Thilo Stadelmann

TL;DR
This paper introduces a novel approach to document recognition by framing it as a transcription task to structured records, leveraging structure-specific relational inductive biases within a transformer architecture.
Contribution
It proposes a method to incorporate structure-specific relational inductive biases into end-to-end document recognition systems, enabling effective transcription of complex, structured documents.
Findings
Successfully transcribed engineering drawings to their interlinked information.
Demonstrated effectiveness on complex record structures like sheet music and shape drawings.
First end-to-end model to transcribe mechanical engineering drawings with inherent structure.
Abstract
Many document types use intrinsic, convention-driven structures that serve to encode precise and structured information, such as the conventions governing engineering drawings. However, many state-of-the-art approaches treat document recognition as a mere computer vision problem, neglecting these underlying document-type-specific structural properties, making them dependent on sub-optimal heuristic post-processing and rendering many less frequent or more complicated document types inaccessible to modern document recognition. We suggest a novel perspective that frames document recognition as a transcription task from a document to a record. This implies a natural grouping of documents based on the intrinsic structure inherent in their transcription, where related document types can be treated (and learned) similarly. We propose a method to design structure-specific relational inductive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
