# Parkinson's disease detection using spectrogram-based multi-model feature fusion networks

**Authors:** Wenna Chen, Rongfu Lv, Xiaowei Du, Xiangyu Chen, Hao Wang, Jincan Zhang, Ganqin Du

PMC · DOI: 10.3389/fneur.2025.1706317 · 2025-11-05

## TL;DR

This paper introduces a non-invasive method for detecting Parkinson's disease using voice recordings and deep learning models that combine features from multiple neural networks.

## Contribution

The novel contribution is a multi-model feature fusion approach using spectrograms and pre-trained CNNs to improve PD detection accuracy.

## Key findings

- The fusion of MobileNetV3-Large and ShuffleNetV2 achieved 95.56% accuracy and an AUC of 0.99 in PD classification.
- Feature fusion outperformed individual models in all evaluation metrics using 5-fold cross-validation.
- The method effectively captures subtle speech patterns indicative of Parkinson's disease.

## Abstract

Parkinson's disease (PD) is a common neurodegenerative disorder. Traditional diagnostic methods, relying on clinical assessment and imaging, are often invasive, costly, and require specialized personnel, posing barriers to early detection. As approximately 90% of PD patients develop vocal impairments, vocal analysis emerges as a promising non-invasive diagnostic tool. However, individual deep learning models are often limited by overfitting and poor generalizability.

This study proposes a PD classification method using spectrogram feature fusion with pre-trained convolutional neural networks (CNNs). Voice recordings were obtained from 61 PD patients and 70 healthy controls (HC) at the First Affiliated Hospital of Henan University of Science and Technology. Preprocessing the raw speech signals yielded 2,476 spectrograms. Three pre-trained models, DenseNet121, MobileNetV3-Large, and ShuffleNetV2, were used for feature extraction. The output of MobileNetV3-Large was adjusted using a 1 × 1 convolutional layer to ensure dimensional alignment before features were fused via summation.

Evaluation using 5-fold cross-validation demonstrated that models employing feature fusion consistently outperformed individual models across all metrics. Specifically, the fusion of MobileNetV3-Large and ShuffleNetV2 achieved the highest accuracy of 95.56% and an AUC of 0.99. Comparative experiments with existing state-of-the-art methods confirmed the competitive performance of the proposed approach.

The fusion of multi-model features more effectively captures subtle pathological signatures in PD speech, overcoming the limitations of single models. This method provides a reliable, low-cost, and non-invasive tool for auxiliary PD diagnosis, with significant potential for clinical application. The code is available at https://github.com/lvrongfu/pjs.

## Linked entities

- **Diseases:** Parkinson's disease (MONDO:0005180)

## Full-text entities

- **Diseases:** neurodegenerative disorder (MESH:D019636), PD (MESH:D010300), vocal impairments (MESH:D013981)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12631296/full.md

---
Source: https://tomesphere.com/paper/PMC12631296