Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection

Jianwei Zhang; Julie Liss; Suren Jayasuriya; and Visar Berisha

arXiv:2211.09858·cs.SD·January 27, 2023·1 cites

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection

Jianwei Zhang, Julie Liss, Suren Jayasuriya, and Visar Berisha

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deep learning framework that generates vocal quality embeddings for dysphonic voice detection, achieving high accuracy and robustness across different datasets and deteriorated conditions.

Contribution

It proposes a novel contrastive and classification loss combined deep learning model with data warping for robust vocal quality feature embeddings.

Findings

01

High in-corpus and cross-corpus classification accuracy

02

Embeddings sensitive to voice quality and robust across datasets

03

Consistently outperforms baseline methods on various datasets

Abstract

Approximately 1.2% of the world's population has impaired voice production. As a result, automatic dysphonic voice detection has attracted considerable academic and clinical interest. However, existing methods for automated voice assessment often fail to generalize outside the training conditions or to other related applications. In this paper, we propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss is combined with a classification loss to train our deep learning model jointly. Data warping methods are used on input voice samples to improve the robustness of our method. Empirical results demonstrate that our method not only achieves high in-corpus and cross-corpus classification accuracy but also generates good embeddings sensitive to voice quality and robust across different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vigor-jzhang/dysphonic-emb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Music and Audio Processing

Methodsfail