A document is worth a structured record: Principled inductive bias design for document recognition

Benjamin Meyer; Lukas Tuggener; Sascha H\"anzi; Daniel Schmid; Erdal Ayfer; Benjamin F. Grewe; Ahmed Abdulkadir; Thilo Stadelmann

arXiv:2507.08458·cs.CV·April 15, 2026

A document is worth a structured record: Principled inductive bias design for document recognition

Benjamin Meyer, Lukas Tuggener, Sascha H\"anzi, Daniel Schmid, Erdal Ayfer, Benjamin F. Grewe, Ahmed Abdulkadir, Thilo Stadelmann

PDF

TL;DR

This paper introduces a novel approach to document recognition by framing it as a transcription task to structured records, leveraging structure-specific relational inductive biases within a transformer architecture.

Contribution

It proposes a method to incorporate structure-specific relational inductive biases into end-to-end document recognition systems, enabling effective transcription of complex, structured documents.

Findings

01

Successfully transcribed engineering drawings to their interlinked information.

02

Demonstrated effectiveness on complex record structures like sheet music and shape drawings.

03

First end-to-end model to transcribe mechanical engineering drawings with inherent structure.

Abstract

Many document types use intrinsic, convention-driven structures that serve to encode precise and structured information, such as the conventions governing engineering drawings. However, many state-of-the-art approaches treat document recognition as a mere computer vision problem, neglecting these underlying document-type-specific structural properties, making them dependent on sub-optimal heuristic post-processing and rendering many less frequent or more complicated document types inaccessible to modern document recognition. We suggest a novel perspective that frames document recognition as a transcription task from a document to a record. This implies a natural grouping of documents based on the intrinsic structure inherent in their transcription, where related document types can be treated (and learned) similarly. We propose a method to design structure-specific relational inductive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.