TL;DR
GHGbench introduces a comprehensive open dataset and benchmark for multi-entity, multi-task carbon emission prediction at company and building levels, highlighting key challenges and baseline performances.
Contribution
It provides a unified benchmark with diverse data sources, tasks, and baseline models, revealing critical insights into emission prediction challenges and model generalization.
Findings
Building emissions are more complex than company emissions.
Significant performance gap exists between in-distribution and out-of-distribution data.
Multimodal remote-sensing embeddings improve prediction accuracy where tabular models struggle.
Abstract
Open datasets and benchmarks for entity-level carbon-emission prediction remain fragmented across access, scale, granularity, and evaluation. We introduce GHGbench, an open dataset and benchmark for company- and building-level greenhouse-gas prediction. The company track contains 32,000+ company-year records from 12,000+ firms with Scope 1+2 and Scope 3 disclosures and financial/sectoral signals; the building track harmonises 491,591 building-year records from 13 open sources into a single schema across 26 metropolitan areas (10 U.S., 15 Australian, 1 Singaporean), with climate covariates and multimodal remote-sensing embeddings. GHGbench defines canonical splits with in-distribution and cross-region/city transfer as primary tasks and temporal hold-out plus short-horizon forecasting as supplementary appendix evidence; headline baselines span gradient-boosted trees, a tabular foundation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
