Attentive-based Multi-level Feature Fusion for Voice Disorder Diagnosis

Lipeng Shen; Yifan Xiong; Dongyue Guo; Wei Mo; Lingyu Yu; Hui Yang; Yi; Lin

arXiv:2410.04797·cs.SD·October 8, 2024

Attentive-based Multi-level Feature Fusion for Voice Disorder Diagnosis

Lipeng Shen, Yifan Xiong, Dongyue Guo, Wei Mo, Lingyu Yu, Hui Yang, Yi, Lin

PDF

Open Access

TL;DR

This paper introduces a novel two-stage framework utilizing pre-trained models and an attentive fusion module to improve voice disorder diagnosis accuracy from raw audio, addressing dataset limitations and enhancing feature integration.

Contribution

The study presents a new multi-level feature fusion framework combining ECAPA-TDNN, Wav2vec 2.0, and an attentive module for better voice disorder detection.

Findings

01

Achieves 90.51% accuracy on FEMH dataset

02

Outperforms baseline methods in voice disorder classification

03

Demonstrates effective multi-level feature fusion for diagnosis

Abstract

Voice disorders negatively impact the quality of daily life in various ways. However, accurately recognizing the category of pathological features from raw audio remains a considerable challenge due to the limited dataset. A promising method to handle this issue is extracting multi-level pathological information from speech in a comprehensive manner by fusing features in the latent space. In this paper, a novel framework is designed to explore the way of high-quality feature fusion for effective and generalized detection performance. Specifically, the proposed model follows a two-stage training paradigm: (1) ECAPA-TDNN and Wav2vec 2.0 which have shown remarkable effectiveness in various domains are employed to learn the universal pathological information from raw audio; (2) An attentive fusion module is dedicatedly designed to establish the interaction between pathological features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis