Optimized Table Tokenization for Table Structure Recognition
Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Peter, Staar

TL;DR
This paper introduces OTSL, an optimized table-structure language that reduces token count and sequence length, leading to faster, more accurate table structure recognition from images with minimal post-processing.
Contribution
The paper proposes OTSL, a new minimal vocabulary language for table-structure recognition that improves accuracy and efficiency over traditional HTML-based representations.
Findings
Token count reduced from 28+ to 5
Sequence length halved on average
Inference time is halved
Abstract
Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ibm-granite/granite-docling-258Mmodel· 60k dl· ♡ 114460k dl♡ 1144
- 🤗docling-project/SmolDocling-256M-previewmodel· 61k dl· ♡ 161061k dl♡ 1610
- 🤗Compumacy/sm_docmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗kp-forks/SmolDocling-256M-previewmodel· 11 dl11 dl
- 🤗Mungert/SmolDocling-256M-preview-GGUFmodel· 77 dl· ♡ 277 dl♡ 2
- 🤗Userb1az/granite-docling-258M-GGUFmodel· 82 dl82 dl
- 🤗Mungert/granite-docling-258M-GGUFmodel· 16 dl16 dl
- 🤗pbebbo/granite-docling-258m-fixedmodel· 1 dl1 dl
- 🤗Phariadata/granite-docling-258M-untiedmodel· 3 dl3 dl
- 🤗philipp-zettl/ibm-granite__granite-docling-258Mmodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Handwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing
