Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs
Lokesh Mishra, Sohayl Dhibi, Yusik Kim, Cesar Berrospi Ramis, Shubham, Gupta, Michele Dolfi, Peter Staar

TL;DR
This paper introduces a universal method for extracting structured ESG KPI information from tables using large language models, enabling efficient analysis of diverse report formats.
Contribution
It proposes a novel statement-based data structure and a T5-based model for universal information extraction from tables, specifically applied to ESG reports.
Findings
Achieved 82% similarity to ground-truth statements
Created SemTabNet dataset with over 100K annotated tables
Enabled large-scale ESG report analysis using statement extraction
Abstract
Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as well as content. We propose Statements, a novel domain agnostic data structure for extracting quantitative facts and related information. We propose translating tables to statements as a new supervised deep-learning universal information extraction task. We introduce SemTabNet - a dataset of over 100K annotated tables. Investigating a family of T5-based Statement Extraction Models, our best model generates statements which are 82% similar to the ground-truth (compared to baseline of 21%). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Computational and Text Analysis Methods · Advanced Text Analysis Techniques
