Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring
Yaman Kumar Singla, Avykat Gupta, Shaurya Bagga, Changyou Chen, Balaji, Krishnamurthy, Rajiv Ratn Shah

TL;DR
This paper introduces a speaker-conditioned hierarchical deep learning model for automated speech scoring that leverages multiple responses from the same candidate to improve scoring accuracy in non-native speakers.
Contribution
It presents a novel approach that incorporates speaker-specific context from multiple responses, enhancing the performance of automated speech scoring systems.
Findings
Performance improved by up to 12.86% with the new method.
Incorporating speaker context significantly benefits scoring accuracy.
Both quantitative and qualitative analyses support the effectiveness of the approach.
Abstract
Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate's speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from a single audio, making them suffer from the lack of speaker-specific context required to model such a complex task. We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context vectors from these responses and feed them as additional speaker-specific context to our network to score a particular response. We compare our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
