# Overcoming missing data in spatial metabolomics with machine learning imputation to accelerate downstream discovery

**Authors:** Tingze Feng, Yuhan Wang, Shaojun Pei, Qiuping Wang, Yirong Li, Jing Lv, Tian Xia, Di Chen, Hai-long Piao

PMC · DOI: 10.1016/j.isci.2026.115203 · 2026-03-03

## TL;DR

This study compares eight methods for filling in missing data in spatial metabolomics, finding that Random Forest and a graph-based method work best for preserving data accuracy and spatial patterns.

## Contribution

A novel graph convolutional network method and a dual-criteria benchmark framework for evaluating imputation in spatial metabolomics.

## Key findings

- Random Forest achieved the highest overall performance in imputation accuracy and spatial cluster preservation.
- A graph convolutional network (GCN) ranked second in both imputation accuracy and spatial structure preservation.
- The benchmark framework was applied across six datasets from mouse, human, and plant tissues.

## Abstract

Mass spectrometry imaging (MSI)-based spatial metabolomics exhibits extensive missing values; yet, practical guidance on how imputation choices affect both imputation accuracy and downstream spatial analyses remains limited. In this study, we evaluated eight imputation methods, including both existing approaches and a graph convolutional network (GCN)-based method specifically designed for spatial metabolomics data, to identify suitable approaches for spatial metabolomics. To enable comprehensive assessment, we developed an evaluation framework focusing on two objective criteria: (a) imputation accuracy and (b) preservation of spatial cluster structure. We assembled six benchmark datasets spanning mouse brain and liver, human kidney and stomach, and plant seed sections, and conducted controlled dropout simulations of missing values. Across both evaluation dimensions, including imputation accuracy and preservation of spatial cluster structure, RF ranked first overall, and GCN ranked second in both dimensions. Overall, this systematic, dual-perspective benchmark study provides guidance for selecting imputation strategies in spatial metabolomics research.

•Eight imputation methods are benchmarked to address missing data in spatial metabolomics•Performance is assessed on both imputation accuracy and spatial clustering preservation•Random Forest and GCN emerge as top performers across multiple tissues and species

Eight imputation methods are benchmarked to address missing data in spatial metabolomics

Performance is assessed on both imputation accuracy and spatial clustering preservation

Random Forest and GCN emerge as top performers across multiple tissues and species

Bioinformatics; Omics; Metabolomics; Machine learning

## Linked entities

- **Species:** Mus musculus (taxon 10090), Homo sapiens (taxon 9606)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12999350/full.md

---
Source: https://tomesphere.com/paper/PMC12999350