# Calibrated Uncertainty Estimation for Soil Organic Carbon from Raman Spectra

**Authors:** Jeffrey K. Wiens, Natalia Solomatova, Sadegh Shokatian

PMC · DOI: 10.1021/acs.analchem.5c04616 · Analytical Chemistry · 2025-12-11

## TL;DR

This paper introduces a framework for estimating calibrated uncertainties in soil organic carbon predictions from Raman spectra, using conformal prediction and multiple uncertainty quantification methods.

## Contribution

The first unified framework combining conformal calibration with multiple UQ strategies for Raman-based SOC estimation.

## Key findings

- Conformalization improves calibration of uncertainty estimates across various ML models.
- Uncertainty in SOC estimation is predominantly aleatoric, suggesting improvements depend on spectral quality.
- The framework produces narrow, well-calibrated prediction intervals with reliable empirical coverage.

## Abstract

Machine learning (ML) is a powerful tool for inferring chemometric
properties from Raman spectra, expanding the information extractable
from high-dimensional spectral data. A growing application is the
estimation of soil organic carbon (SOC), where ML models relate overlapping
Raman and fluorescence features to chemical composition. However,
these models typically lack calibrated, prediction-level uncertainty
estimates that limit their utility in decision-critical contexts.
We present a framework for quantifying predictive uncertainty in SOC
estimation from Raman spectra using Shifted Excitation Raman Difference
Spectroscopy (SERDS). The approach employs conformal prediction (CP)
to generate statistically valid prediction intervals using a held-out
calibration data set and is compatible with a variety of uncertainty
quantification (UQ) methods. To our knowledge, this is the first unified
framework that integrates conformal calibration with multiple UQ strategies
for Raman-based SOC estimation, addressing both aleatoric (irreducible)
and epistemic (reducible) sources of uncertainty in a field-relevant
setting. We assess the framework across several regression models,
including Deep Ensembles, Bayesian neural networks, Monte Carlo Dropout,
quantile regression, and heteroscedastic Gaussian models. All methods,
when conformalized, produced well-calibrated uncertainty estimates
with narrow prediction intervals, achieving reliable empirical coverage
across confidence levels. Ablation studies revealed that many UQ techniques
were poorly calibrated without conformalization. Our findings indicate
that uncertainty in this task is predominantly aleatoric in nature,
suggesting that improvements in predictive performance will depend
more on improving spectral quality and preprocessing than on model
complexity. This framework provides a practical, generalizable solution
for generating trustworthy, calibrated, sample-specific uncertainty
estimates in Raman-based chemometric analyses.

## Full-text entities

- **Chemicals:** Organic Carbon (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12809652/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12809652/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12809652/full.md

---
Source: https://tomesphere.com/paper/PMC12809652