LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil Mapping
J. Schmidinger, S. Vogel, V. Barkov, A.-D. Pham, R. Gebbers, H. Tavakoli, J. Correa, T. R. Tavares, P. Filippi, E. J. Jones, V. Lukas, E. Boenecke, J. Ruehlmann, I. Schroeter, E. Kramer, S. Paetzold, M. Kodaira, A. M. J.-C. Wadoux, L. Bragazza, K. Metzger, J. Huang

TL;DR
LimeSoDa is an open-access collection of 31 diverse soil datasets designed to benchmark machine learning regressors in digital soil mapping, revealing context-dependent performances of different algorithms.
Contribution
This paper introduces LimeSoDa, a comprehensive dataset collection for benchmarking soil property prediction methods, addressing limitations of previous studies relying on single datasets.
Findings
No single algorithm is universally best across datasets.
MLR and SVR excel on high-dimensional spectral data.
CatBoost and RF perform better with fewer features.
Abstract
Digital soil mapping (DSM) relies on a broad pool of statistical methods, yet determining the optimal method for a given context remains challenging and contentious. Benchmarking studies on multiple datasets are needed to reveal strengths and limitations of commonly used methods. Existing DSM studies usually rely on a single dataset with restricted access, leading to incomplete and potentially misleading conclusions. To address these issues, we introduce an open-access dataset collection called Precision Liming Soil Datasets (LimeSoDa). LimeSoDa consists of 31 field- and farm-scale datasets from various countries. Each dataset has three target soil properties: (1) soil organic matter or soil organic carbon, (2) clay content and (3) pH, alongside a set of features. Features are dataset-specific and were obtained by optical spectroscopy, proximal- and remote soil sensing. All datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression · Sparse Evolutionary Training
