TASER: Table Agents for Schema-guided Extraction and Recommendation
Nicole Cho, Kirsty Fielding, William Watson, Sumitra Ganesh, Manuela Veloso

TL;DR
TASER is a novel, continuously learning system that extracts and normalizes complex, multi-page financial tables into schema-conforming data, significantly improving detection and extraction accuracy over existing vision-based models.
Contribution
The paper introduces TASER, an agentic, schema-guided extraction system that outperforms existing models and incorporates continuous learning for complex financial tables.
Findings
TASER outperforms Table Transformer by 10.1% in detection accuracy.
Larger batch sizes increase useful schema recommendations by 104.3%.
Manual labeling of 22,584 pages enabled effective training and evaluation.
Abstract
Real-world financial filings report critical information about an entity's investment holdings, essential for assessing that entity's risk, profitability, and relationship profile. Yet, these details are often buried in messy, multi-page, fragmented tables that are difficult to parse, hindering downstream QA and data normalization. Specifically, 99.4% of the tables in our financial table dataset lack bounding boxes, with the largest table spanning 44 pages. To address this, we present TASER (Table Agents for Schema-guided Extraction and Recommendation), a continuously learning, agentic table extraction system that converts highly unstructured, multi-page, heterogeneous tables into normalized, schema-conforming outputs. Guided by an initial portfolio schema, TASER executes table detection, classification, extraction, and recommendations in a single pipeline. Our Recommender Agent reviews…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Web Data Mining and Analysis
