TDATR: Improving End-to-End Table Recognition via Table Detail-Aware Learning and Cell-Level Visual Alignment

Chunxia Qin; Chenyu Liu; Pengcheng Xia; Jun Du; Baocai Yin; Bing Yin; Cong Liu

arXiv:2603.22819·cs.CV·March 25, 2026

TDATR: Improving End-to-End Table Recognition via Table Detail-Aware Learning and Cell-Level Visual Alignment

Chunxia Qin, Chenyu Liu, Pengcheng Xia, Jun Du, Baocai Yin, Bing Yin, Cong Liu

PDF

Open Access

TL;DR

TDATR introduces a novel end-to-end table recognition method that leverages table detail-aware learning and cell-level visual alignment to improve robustness and accuracy, especially in data-limited scenarios.

Contribution

The paper proposes TDATR, a new end-to-end framework that jointly perceives table structure and content, and incorporates a structure-guided cell localization module for enhanced performance.

Findings

01

Achieves state-of-the-art results on seven benchmarks.

02

Effectively handles limited training data scenarios.

03

Improves interpretability and alignment in table recognition.

Abstract

Tables are pervasive in diverse documents, making table recognition (TR) a fundamental task in document analysis. Existing modular TR pipelines separately model table structure and content, leading to suboptimal integration and complex workflows. End-to-end approaches rely heavily on large-scale TR data and struggle in data-constrained scenarios. To address these issues, we propose TDATR (Table Detail-Aware Table Recognition) improves end-to-end TR through table detail-aware learning and cell-level visual alignment. TDATR adopts a ``perceive-then-fuse'' strategy. The model first performs table detail-aware learning to jointly perceive table structure and content through multiple structure understanding and content recognition tasks designed under a language modeling paradigm. These tasks can naturally leverage document data from diverse scenarios to enhance model robustness. The model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Topic Modeling