ESGLens: An LLM-Based RAG Framework for Interactive ESG Report Analysis and Score Prediction
Tsung-Yu Yang, Meng-Chi Chen

TL;DR
ESGLens is a domain-specific framework that automates ESG report analysis and score prediction using retrieval-augmented generation and prompt-engineered extraction, evaluated on about 300 reports.
Contribution
The paper introduces ESGLens, a novel LLM-based RAG framework tailored for ESG report analysis, including extraction, question-answering, and scoring, with a focus on domain-specific processing.
Findings
ChatGPT embeddings with Neural Network regressors achieve a Pearson correlation of 0.48.
The framework successfully extracts claims with 80% source verification.
Evaluation on 300 reports demonstrates the method's potential despite dataset limitations.
Abstract
Environmental, Social, and Governance (ESG) reports are central to investment decision-making, yet their length, heterogeneous content, and lack of standardized structure make manual analysis costly and inconsistent. We present ESGLens, a proof-of-concept framework combining retrieval-augmented generation (RAG) with prompt-engineered extraction to automate three tasks: (1)~structured information extraction guided by Global Reporting Initiative (GRI) standards, (2)~interactive question-answering with source traceability, and (3)~ESG score prediction via regression on LLM-generated embeddings. ESGLens is purpose-built for the domain: a report-processing module segments heterogeneous PDF content into typed chunks (text, tables, charts); a GRI-guided extraction module retrieves and synthesizes information aligned with specific standards; and a scoring module embeds extracted summaries and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
