GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction

Yifan Duan; Siyuan Zheng; Lihuan Li; Chao Xue; Flora Salim

arXiv:2605.13743·cs.LG·May 14, 2026

GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction

Yifan Duan, Siyuan Zheng, Lihuan Li, Chao Xue, Flora Salim

PDF

1 Repo

TL;DR

GHGbench introduces a comprehensive open dataset and benchmark for multi-entity, multi-task carbon emission prediction at company and building levels, highlighting key challenges and baseline performances.

Contribution

It provides a unified benchmark with diverse data sources, tasks, and baseline models, revealing critical insights into emission prediction challenges and model generalization.

Findings

01

Building emissions are more complex than company emissions.

02

Significant performance gap exists between in-distribution and out-of-distribution data.

03

Multimodal remote-sensing embeddings improve prediction accuracy where tabular models struggle.

Abstract

Open datasets and benchmarks for entity-level carbon-emission prediction remain fragmented across access, scale, granularity, and evaluation. We introduce GHGbench, an open dataset and benchmark for company- and building-level greenhouse-gas prediction. The company track contains 32,000+ company-year records from 12,000+ firms with Scope 1+2 and Scope 3 disclosures and financial/sectoral signals; the building track harmonises 491,591 building-year records from 13 open sources into a single schema across 26 metropolitan areas (10 U.S., 15 Australian, 1 Singaporean), with climate covariates and multimodal remote-sensing embeddings. GHGbench defines canonical splits with in-distribution and cross-region/city transfer as primary tasks and temporal hold-out plus short-horizon forecasting as supplementary appendix evidence; headline baselines span gradient-boosted trees, a tabular foundation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.