CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
Matthew Fortier, Mats L. Richter, Oliver Sonnentag, Chris Pal

TL;DR
CarbonSense is the first comprehensive, machine learning-ready dataset combining flux measurements, meteorological data, and satellite imagery to advance data-driven carbon flux modeling with multimodal deep learning.
Contribution
We introduce CarbonSense, a standardized multimodal dataset for carbon flux modeling and establish baseline models including a novel transformer-based approach.
Findings
Multimodal deep learning improves carbon flux prediction accuracy.
Baseline models demonstrate the dataset's utility for model development.
The dataset covers 385 global locations, enabling diverse model training.
Abstract
Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote comparisons between models. To address this gap, we present CarbonSense, the first machine learning-ready dataset for DDCFM. CarbonSense integrates measured carbon fluxes, meteorological predictors, and satellite imagery from 385 locations across the globe, offering comprehensive coverage and facilitating robust model training. Additionally, we provide a baseline model using a current state-of-the-art DDCFM approach and a novel transformer based model. Our experiments illustrate the potential…
Peer Reviews
Decision·ICLR 2025 Poster
CarbonSense integrates diverse data modalities—measured carbon fluxes, meteorological predictors, and satellite imagery—across a wide array of ecosystems. Researchers can use this dataset as a standardized benchmark.
1.The dataset was compiled from multiple sources with various modalities, which may introduce inconsistency or OOD samples when doing model training. Careful data analysis can be helpful 2. The experiment shows the proposed EcoPerceiver outperformed the current SOTA approach for most IGBP types especially WET, WAT, and ENF. However, the paper did not include an ablation study to showcase why the proposed model achieved this performance. 3.There is only one baseline compared and there is no model
* Paper is well written and cohesive. It is easy to follow and understand even for non-experts in the field * Adds a clear contribution to existing dataset in terms of scale and modalities added. This will definitely help progress in the field. * Dataset will be open for anyone to use. This is important for it to make any impact.
* The satellite imagery added is very low resolution wich limits it's potential usefulness. * The dataset does not include many observations outside developed countries. Nothing the authors can do sinse they leverage existing EC stations available.
- The benchmark has multimodal inputs, which helps fill a notable gap in multimodal benchmarks for remote sensing/geospatial ML. - The paper presents useful and approachable background about the carbon flux modeling problem. - The benchmark code is designed to allow flexibility in reproducing and extending or modifying the dataset based on user needs. - The benchmark has a permissive CC-BY license. - The EcoPerceiver method is well motivated based on the domain-specific carbon flux modeling pro
- The train/test splits are divided by station location, which avoids spatial autocorrelation issues. It seems there could be significant temporal autocorrelation within each split since there are many measurements from the same location. Is temporal autocorrelation a concern? - The experiments only compared two models, XGBoost and EcoPerceiver. It would be useful to see additional models benchmarked (especially deep learning models) to get a sense of the variation in performance existing soluti
Videos
Taxonomy
TopicsAtmospheric and Environmental Gas Dynamics · Carbon Dioxide Capture Technologies · Vehicle emissions and performance
