AgroFlux: A Spatial-Temporal Benchmark for Carbon and Nitrogen Flux Prediction in Agricultural Ecosystems
Qi Cheng, Licheng Liu, Yao Zhang, Mu Hong, Yiqun Xie, Xiaowei Jia

TL;DR
This paper introduces a comprehensive spatial-temporal benchmark dataset for carbon and nitrogen flux prediction in agroecosystems, integrating simulations and real-world data to enhance AI model development.
Contribution
It provides the first benchmark dataset combining physics-based simulations and real observations for agroecosystem flux prediction, enabling better AI model evaluation and development.
Findings
Deep learning models show varying performance on flux prediction tasks.
Transfer learning improves model generalization on real-world data.
The benchmark facilitates development of more accurate AI models for ecosystem management.
Abstract
Agroecosystem, which heavily influenced by human actions and accounts for a quarter of global greenhouse gas emissions (GHGs), plays a crucial role in mitigating global climate change and securing environmental sustainability. However, we can't manage what we can't measure. Accurately quantifying the pools and fluxes in the carbon, nutrient, and water nexus of the agroecosystem is therefore essential for understanding the underlying drivers of GHG and developing effective mitigation strategies. Conventional approaches like soil sampling, process-based models, and black-box machine learning models are facing challenges such as data sparsity, high spatiotemporal heterogeneity, and complex subsurface biogeochemical and physical processes. Developing new trustworthy approaches such as AI-empowered models, will require the AI-ready benchmark dataset and outlined protocols, which…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* Extensive empirical comparison of multiple deep learning architectures (LSTM, 1D-CNN, Transformer) across a diverse set of biogeophysical prediction tasks. * The paper reflects substantial domain knowledge from the biogeosciences, both in how variables are selected/aggregated and how the tasks are framed.
* Limited relevance for the broader ML community outside biogeosciences. The paper does not clearly articulate which underlying modeling patterns or dataset properties are generalizable beyond this specific domain, or why this benchmark represents a unique opportunity for ML research at large. * Results are largely inconclusive and there is no methodological innovation. The baseline comparisons do not yield a clear insight or takeaway that advances our understanding of model behavior or design p
1. Comprehensive integration of PBM simulation data and real-world observational datasets at daily granularity. 2. Covers a wide range of environmental and management variables across different sites and conditions. 3. Standardized prediction tasks and consistent evaluation metrics (R2, RMSE, MAE) enable fair, reproducible assessment. 4. Includes transfer learning benchmarks, pushing forward domain adaptation research. 5. Provides baseline performances on state-of-the-art sequential deep learnin
1. The dataset may still be limited to certain regions or crop types, possibly restricting generalizability. 2. Complexity in data integration from multiple sources may pose application challenges. 3. Machine learning models’ performance could be sensitive to the high spatio-temporal variability of agricultural fluxes. 4. PBMs themselves have inherent biases that might propagate into benchmarks. 5. Only a data-driven approach might not be sufficient to capture vastly complex agricultural fluxes.
1. The monitoring of greenhouse gas surface fluxes is highly relevant for verification of international climate agreements, and at the same time quite challenging due to gaps in our understanding of ecosystem processes and limited observations. 2. This work allows for a multi-task (NEE, GPP & N2O flux) training of models, with potential benefits across tasks. 3. The presented process-based model simulations allow for synthetic experiments to better illucidate the extrapolation capabilities of cu
Major comments: 1. A lot of the data has already been introduced in Liu et al 2024. What exactly is the contribution of this paper? 2. Insufficient baselines. FluxCom X-Base should be included, as should be an XGBoost trained on your dataset and the two process-based models Ecosys and Daycent. 3. Mediocre performance on the synthetic dataset. Why do you achieve only mediocre performance on a mere emulation task? Are your models too small or not properly tuned? Are you extrapolating in feature s
1. The paper tackles an important environmental and agricultural challenge: modeling carbon and nitrogen fluxes to support climate-change mitigation. The introduction provides solid context linking agriculture, greenhouse gas emissions, and the need for AI-ready benchmarks. 2. The authors define consistent data splits, metrics, and tasks for both simulated and observational datasets, facilitating fair model comparison. 3. Baseline results for six state-of-the-art time-series models (LSTM, TCN,
1. The major limitation is that AgroFlux primarily relies on synthetic simulations from process-based models (Ecosys and DayCent). Although the paper claims to integrate observational data, the latter is extremely limited in both spatial and temporal coverage (e.g., 11 flux-tower sites and one small-scale N₂O experiment). 2. The contribution of this work is primarily dataset engineering, and the idea of combining PBM simulations with flux-tower observations has been explored before (e.g., FLUXC
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant Water Relations and Carbon Dynamics · Soil Geostatistics and Mapping · Remote Sensing in Agriculture
