Residual Attention Based Network for Automatic Classification of   Phonation Modes

Xiaoheng Sun; Yiliang Jiang; Wei Li

arXiv:2107.08425·eess.AS·July 20, 2021

Residual Attention Based Network for Automatic Classification of Phonation Modes

Xiaoheng Sun, Yiliang Jiang, Wei Li

PDF

Open Access

TL;DR

This paper introduces a Residual Attention network for automatic classification of phonation modes in singing, achieving higher accuracy than previous methods by focusing on relevant features.

Contribution

The study proposes a novel Residual Attention based network that improves phonation mode classification accuracy over existing approaches.

Findings

01

Achieved up to 94.58% classification accuracy.

02

Outperformed previous methods on three datasets.

03

Enhanced focus on relevant features through attention mechanism.

Abstract

Phonation mode is an essential characteristic of singing style as well as an important expression of performance. It can be classified into four categories, called neutral, breathy, pressed and flow. Previous studies used voice quality features and feature engineering for classification. While deep learning has achieved significant progress in other fields of music information retrieval (MIR), there are few attempts in the classification of phonation modes. In this study, a Residual Attention based network is proposed for automatic classification of phonation modes. The network consists of a convolutional network performing feature processing and a soft mask branch enabling the network focus on a specific area. In comparison experiments, the models with proposed network achieve better results in three of the four datasets than previous works, among which the highest classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Voice and Speech Disorders · Speech Recognition and Synthesis