# Kernel mean matching enhances risk estimation under spatial distribution shifts

**Authors:** Egor Serov, Diana Koldasbayeva, Alexey Zaytsev

PMC · DOI: 10.1038/s41598-026-36740-7 · Scientific Reports · 2026-02-02

## TL;DR

This paper shows that Kernel Mean Matching improves risk estimation in spatial data by handling complex distribution shifts better than existing methods.

## Contribution

The study introduces Kernel Mean Matching as a novel and effective method for spatial risk estimation under distribution shifts.

## Key findings

- KMM reduces MAPE by 12.3–86.5% compared to alternatives in high-dimensional spatial settings.
- KMM bypasses density ratio estimation errors by directly minimizing distributional divergence.
- KMM performs consistently across ecological and biomedical spatial datasets.

## Abstract

Accurate risk estimation under distribution shifts is critical for deploying machine learning models in real-world spatial applications, from ecological forecasting to medical image analysis. Conventional methods such as No Weighting (NW) and Importance Weighting (IW) fail in spatially structured data due to two challenges: (1) density ratio estimation in high-dimensional clustered distributions and (2) non-stationarity from environmental gradients or sampling biases. Classifier-based approaches offer partial improvements but often yield miscalibrated risk estimates by prioritizing discriminative accuracy over distribution alignment. We conduct a systematic evaluation of four risk estimation methods —NW, IW, Kernel Mean Matching (KMM), and classifier-based reweighting—across synthetic benchmarks (with controlled spatial clustering) and real-world datasets (species distributions and immune cell layouts). Results show that KMM achieves superior robustness, reducing Mean Absolute Percentage Error (MAPE) by 12.3–86.5% compared to alternatives in high-dimensional settings. This advantage stems from KMM’s direct minimization of distributional divergence via kernel embeddings, bypassing error-prone density ratio estimation. Our findings demonstrate that KMM is a principled solution for spatial risk estimation, particularly when source and target distributions exhibit complex clustering or sampling artifacts. Its consistency across ecological and biomedical domains suggests broad applicability for reliable model deployment in spatially heterogeneous environments.

## Full-text entities

- **Diseases:** pneumonia (MESH:D011014), GMM (MESH:D004195), cancer (MESH:D009369), KMM (MESH:D009800)
- **Chemicals:** GMM (-)
- **Species:** Tussilago farfara (coltsfoot, species) [taxon 118778], Anemone nemorosa (species) [taxon 37489], Caltha palustris (species) [taxon 3449]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12917278/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12917278/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12917278/full.md

---
Source: https://tomesphere.com/paper/PMC12917278