# An electronic product carbon footprint dataset for question answering

**Authors:** Kaiwen Zhao, Ajesh Koyatan Chathoth, Bharathan Balaji, Stephen Lee

PMC · DOI: 10.1038/s41597-026-06544-5 · 2026-01-14

## TL;DR

This paper introduces a dataset to extract and analyze carbon footprint data from computing product reports, aiming to standardize emissions information.

## Contribution

The novel contribution is a carbon QA dataset for structured extraction of emissions data from unstructured sustainability reports.

## Key findings

- The dataset includes annotated metadata and numerical reasoning tasks for accurate data processing.
- About 75% of the dataset reflects PAIA-style carbon reporting practices.
- The dataset supports training language models to automate emissions data aggregation.

## Abstract

The embodied carbon of computing systems constitutes a significant portion of their greenhouse gas (GHG) emissions. To support environmental initiatives and meet evolving standards, many companies now disclose product carbon footprints in sustainability reports, often with detailed breakdowns. Yet these reports appear in diverse and unstructured formats—text, tables, and graphs embedded in PDFs—creating major challenges for extracting and analyzing component-specific emissions data. This lack of standardization limits comparative assessments and opportunities for targeted reductions. To address this, we introduce a carbon question-answering (QA) dataset designed to enable the extraction and analysis of data from carbon reports of computing products. The dataset features annotated metadata, numerical reasoning tasks, and structured derivations to ensure accurate processing of fragmented information. Because approximately 75% of products in the dataset follow the PAIA (MIT) model for carbon footprinting, the dataset primarily reflects PAIA-style reporting practices, offering insight into how industry methods influence reported values. This work establishes a foundation for training advanced language models to automate aggregation and standardization of emissions data for ICT systems.

## Full-text entities

- **Chemicals:** carbon (MESH:D002244)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12901965/full.md

---
Source: https://tomesphere.com/paper/PMC12901965