Residual Attention Based Network for Automatic Classification of Phonation Modes
Xiaoheng Sun, Yiliang Jiang, Wei Li

TL;DR
This paper introduces a Residual Attention network for automatic classification of phonation modes in singing, achieving higher accuracy than previous methods by focusing on relevant features.
Contribution
The study proposes a novel Residual Attention based network that improves phonation mode classification accuracy over existing approaches.
Findings
Achieved up to 94.58% classification accuracy.
Outperformed previous methods on three datasets.
Enhanced focus on relevant features through attention mechanism.
Abstract
Phonation mode is an essential characteristic of singing style as well as an important expression of performance. It can be classified into four categories, called neutral, breathy, pressed and flow. Previous studies used voice quality features and feature engineering for classification. While deep learning has achieved significant progress in other fields of music information retrieval (MIR), there are few attempts in the classification of phonation modes. In this study, a Residual Attention based network is proposed for automatic classification of phonation modes. The network consists of a convolutional network performing feature processing and a soft mask branch enabling the network focus on a specific area. In comparison experiments, the models with proposed network achieve better results in three of the four datasets than previous works, among which the highest classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Voice and Speech Disorders · Speech Recognition and Synthesis
