ESGReveal: An LLM-based approach for extracting structured data from ESG reports
Yi Zou, Mengying Shi, Zhongjie Chen, Zhu Deng, ZongXiong Lei, Zihan, Zeng, Shiming Yang, HongXiang Tong, Lei Xiao, Wenwen Zhou

TL;DR
ESGReveal leverages Large Language Models with retrieval techniques to extract and analyze ESG data from corporate reports, achieving high accuracy and revealing insights into disclosure practices across industries.
Contribution
The paper introduces ESGReveal, a novel LLM-based framework that enhances ESG data extraction and analysis from reports, outperforming baseline models in accuracy.
Findings
Achieved 76.9% data extraction accuracy with GPT-4.
Discovered 69.5% environmental and 57.2% social disclosure rates.
Demonstrated improved ESG reporting analysis over existing methods.
Abstract
ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports, catering to the critical need for reliable ESG information retrieval. This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques. The ESGReveal system includes an ESG metadata module for targeted queries, a preprocessing module for assembling databases, and an LLM agent for data extraction. Its efficacy was appraised using ESG reports from 166 companies across various sectors listed on the Hong Kong Stock Exchange in 2022, ensuring comprehensive industry and market capitalization representation. Utilizing ESGReveal unearthed significant insights into ESG reporting with GPT-4, demonstrating an accuracy of 76.9% in data extraction and 83.7% in disclosure analysis, which is an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCorporate Social Responsibility Reporting
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Softmax · Label Smoothing · Adam · Absolute Position Encodings · Dense Connections
