Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report
Yan Chen, Yu Zou, Jialei Zeng, Haoran You, Xiaorui Zhou, Aixi Zhong

TL;DR
Pharos-ESG is a comprehensive framework that converts complex ESG reports into structured, multimodal, and hierarchically labeled data to facilitate financial analysis and decision-making.
Contribution
It introduces a novel multimodal parsing and hierarchical labeling framework for ESG reports, along with a large-scale annotated dataset, Aurora-ESG.
Findings
Outperforms existing document parsing and multimodal models on benchmark datasets.
Effectively captures layout, hierarchy, and semantic content of ESG reports.
Enables better integration of ESG data into financial analysis.
Abstract
Environmental, Social, and Governance (ESG) principles are reshaping the foundations of global financial governance, transforming capital allocation architectures, regulatory frameworks, and systemic risk coordination mechanisms. However, as the core medium for assessing corporate ESG performance, the ESG reports present significant challenges for large-scale understanding, due to chaotic reading order from slide-like irregular layouts and implicit hierarchies arising from lengthy, weakly structured content. To address these challenges, we propose Pharos-ESG, a unified framework that transforms ESG reports into structured representations through multimodal parsing, contextual narration, and hierarchical labeling. It integrates a reading-order modeling module based on layout flow, hierarchy-aware segmentation guided by table-of-contents anchors, and a multi-modal aggregation pipeline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
