From Surface to Semantics: Semantic Structure Parsing for Table-Centric Document Analysis
Xuan Li, Jialiang Dong, Raymond Wong

TL;DR
This paper introduces DOTABLER, a novel framework for deep semantic parsing of tables within documents, enabling advanced understanding and retrieval by uncovering semantic links between tables and their context.
Contribution
The paper presents DOTABLER, a comprehensive semantic parsing framework that integrates deep contextual understanding and domain-specific fine-tuning for improved table analysis in documents.
Findings
Achieves over 90% Precision and F1 scores on real-world PDF datasets.
Outperforms models like GPT-4o in semantic table analysis tasks.
Demonstrates effective deep parsing of tables and their contextual associations.
Abstract
Documents are core carriers of information and knowl-edge, with broad applications in finance, healthcare, and scientific research. Tables, as the main medium for structured data, encapsulate key information and are among the most critical document components. Existing studies largely focus on surface-level tasks such as layout analysis, table detection, and data extraction, lacking deep semantic parsing of tables and their contextual associations. This limits advanced tasks like cross-paragraph data interpretation and context-consistent analysis. To address this, we propose DOTABLER, a table-centric semantic document parsing framework designed to uncover deep semantic links between tables and their context. DOTABLER leverages a custom dataset and domain-specific fine-tuning of pre-trained models, integrating a complete parsing pipeline to identify context segments semantically tied to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
