# A hierarchical dataset on multiple energy consumption and PV generation with emissions and weather information

**Authors:** Hanjiang Dong, Jizhong Zhu, Chi-yung Chung, Zipeng Liang, Haosen Yang, Xiyu Wen

PMC · DOI: 10.1038/s41597-025-06010-8 · Scientific Data · 2025-10-31

## TL;DR

This paper introduces a comprehensive dataset combining energy consumption, solar power generation, emissions, and weather data for 147 buildings and communities.

## Contribution

The novel contribution is a hierarchical, multi-source dataset with detailed energy, emissions, and weather data for benchmarking and analysis.

## Key findings

- The dataset includes 11,987,328 records spanning 2014-2022 with 13 hourly variables.
- A two-stage data cleaning process ensures data quality and provides artificial ground truth for benchmarking.
- The dataset is publicly available for tasks like forecasting and optimization.

## Abstract

This study constructs a multi-source and hierarchical dataset of energy consumption, photovoltaic (PV) power generation, greenhouse gas (GHG) emissions, and weather information, dubbed Hierarchical Energy, Emissions, and Weather (HEEW). This dataset contains 11,987,328 records for 147 individual buildings, four aggregated communities, and the entire region, which is structured as time-series tables indexed by building ID and timestamps from 1 January 2014 to 31 December 2022. It includes 13 hourly variables as follows. Energy records involve PV output and total energy consumption of electricity, heat, and cooling loads. Weather involves temperature, dew point, humidity, wind speed, wind gust, pressure, and precipitation. GHG emissions are estimated as the net values between the emissions from energy consumption and the offset by PV output. To ensure the feasibility, we develop a two-stage baseline data cleaning scheme, available at GitHub, where missing values are imputed, and abnormal values are corrected as artificial ground truth. The real-world dataset at figshare in CSV format serves as benchmarks for imputation, anomaly detection, clustering, decomposition, classification, forecasting, and optimization.

## Full-text entities

- **Diseases:** HEEW (MESH:D011502)
- **Chemicals:** CH4 (MESH:D008697), mercury (MESH:D008628), N2O (MESH:D009609), water (MESH:D014867), GHG (MESH:D000074382), CO2 (MESH:D002245), Greenhouse (-), NO2 (MESH:D009585)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12579212/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12579212/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12579212/full.md

---
Source: https://tomesphere.com/paper/PMC12579212