SHAPCA: Consistent and Interpretable Explanations for Machine Learning Models on Spectroscopy Data
Mingxing Zhang, Nicola Rossberg, Simone Innocente, Katarzyna Komolibus, Rekha Gautam, Barry O'Sullivan, Luca Longo, and Andrea Visentin

TL;DR
SHAPCA is a novel machine learning pipeline that combines PCA and SHAP to provide consistent, interpretable explanations for spectroscopy data, linking model predictions back to original biological signals.
Contribution
It introduces SHAPCA, a method that enhances interpretability and stability of explanations for spectroscopy-based models by integrating PCA with SHAP in the original feature space.
Findings
Demonstrated improved explanation consistency across runs
Enabled interpretation of spectral features in original space
Provided both global and local model insights
Abstract
In recent years, machine learning models have been increasingly applied to spectroscopic datasets for chemical and biomedical analysis. For their successful adoption, particularly in clinical and safety-critical settings, professionals and researchers must be able to understand and trust the reasoning behind model predictions. However, the inherently high dimensionality and strong collinearity of spectroscopy data pose a fundamental challenge to model explainability. These properties not only complicate model training but also undermine the stability and consistency of explanations, leading to fluctuations in feature importance across repeated training runs. Feature extraction techniques have been used to reduce the input dimensionality; these new features hinder the connection between the prediction and the original signal. This study proposes SHAPCA, an explainable machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Spectroscopy Techniques in Biomedical and Chemical Research · Machine Learning in Materials Science
