Optimizing Automatic Speech Assessment: W-RankSim Regularization and   Hybrid Feature Fusion Strategies

Chung-Wen Wu; Berlin Chen

arXiv:2406.10873·cs.SD·June 18, 2024

Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies

Chung-Wen Wu, Berlin Chen

PDF

Open Access

TL;DR

This paper introduces W-RankSim regularization and hybrid feature fusion strategies to improve automatic speech assessment by addressing data imbalance and leveraging both self-supervised and handcrafted features.

Contribution

It presents a novel W-RankSim regularization technique for ordinal classification and demonstrates the benefits of combining SSL and handcrafted features in ASA.

Findings

01

W-RankSim improves ordinal classification accuracy in ASA.

02

Hybrid feature fusion enhances system performance.

03

Experimental results confirm the effectiveness of the proposed methods.

Abstract

Automatic Speech Assessment (ASA) has seen notable advancements with the utilization of self-supervised features (SSL) in recent research. However, a key challenge in ASA lies in the imbalanced distribution of data, particularly evident in English test datasets. To address this challenge, we approach ASA as an ordinal classification task, introducing Weighted Vectors Ranking Similarity (W-RankSim) as a novel regularization technique. W-RankSim encourages closer proximity of weighted vectors in the output layer for similar classes, implying that feature vectors with similar labels would be gradually nudged closer to each other as they converge towards corresponding weighted vectors. Extensive experimental evaluations confirm the effectiveness of our approach in improving ordinal classification performance for ASA. Furthermore, we propose a hybrid model that combines SSL and handcrafted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing