Overcoming missing data in spatial metabolomics with machine learning imputation to accelerate downstream discovery
Tingze Feng, Yuhan Wang, Shaojun Pei, Qiuping Wang, Yirong Li, Jing Lv, Tian Xia, Di Chen, Hai-long Piao

TL;DR
This study compares eight methods for filling in missing data in spatial metabolomics, finding that Random Forest and a graph-based method work best for preserving data accuracy and spatial patterns.
Contribution
A novel graph convolutional network method and a dual-criteria benchmark framework for evaluating imputation in spatial metabolomics.
Findings
Random Forest achieved the highest overall performance in imputation accuracy and spatial cluster preservation.
A graph convolutional network (GCN) ranked second in both imputation accuracy and spatial structure preservation.
The benchmark framework was applied across six datasets from mouse, human, and plant tissues.
Abstract
Mass spectrometry imaging (MSI)-based spatial metabolomics exhibits extensive missing values; yet, practical guidance on how imputation choices affect both imputation accuracy and downstream spatial analyses remains limited. In this study, we evaluated eight imputation methods, including both existing approaches and a graph convolutional network (GCN)-based method specifically designed for spatial metabolomics data, to identify suitable approaches for spatial metabolomics. To enable comprehensive assessment, we developed an evaluation framework focusing on two objective criteria: (a) imputation accuracy and (b) preservation of spatial cluster structure. We assembled six benchmark datasets spanning mouse brain and liver, human kidney and stomach, and plant seed sections, and conducted controlled dropout simulations of missing values. Across both evaluation dimensions, including…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Machine Learning in Bioinformatics · Cell Image Analysis Techniques
