TL;DR
TUTA introduces a unified transformer-based pre-training framework that effectively captures spatial, hierarchical, and semantic information in generally structured tables using a novel bi-dimensional coordinate tree and three progressive objectives.
Contribution
It proposes a novel tree-based structure and attention mechanisms for understanding diverse table structures, advancing pre-training for table understanding tasks.
Findings
Achieves state-of-the-art results on five datasets.
Effectively models spatial, hierarchical, and semantic table information.
Improves performance on cell and table type classification tasks.
Abstract
Tables are widely used with various structures to organize and present data. Recent attempts on table understanding mainly focus on relational tables, yet overlook to other common table structures. In this paper, we propose TUTA, a unified pre-training architecture for understanding generally structured tables. Noticing that understanding a table requires spatial, hierarchical, and semantic information, we enhance transformers with three novel structure-aware mechanisms. First, we devise a unified tree-based structure, called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information of generally structured tables. Upon this, we propose tree-based attention and position embedding to better capture the spatial and hierarchical information. Moreover, we devise three progressive pre-training objectives to enable representations at the token, cell, and table…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
