SememeASR: Boosting Performance of End-to-End Speech Recognition against   Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Jiaxu Zhu; Changhe Song; Zhiyong Wu; Helen Meng

arXiv:2309.01437·cs.SD·October 10, 2023

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng

PDF

Open Access

TL;DR

SememeASR enhances end-to-end speech recognition by integrating sememe semantic knowledge, significantly improving performance in domain mismatch and long-tailed data scenarios.

Contribution

This paper introduces sememe-based semantic knowledge into speech recognition to address domain and long-tailed data challenges, a novel knowledge-driven approach.

Findings

01

Improves speech recognition accuracy with sememe integration.

02

Enhances model's ability to recognize long-tailed data.

03

Boosts domain generalization of speech recognition models.

Abstract

Recently, excellent progress has been made in speech recognition. However, pure data-driven approaches have struggled to solve the problem in domain-mismatch and long-tailed data. Considering that knowledge-driven approaches can help data-driven approaches alleviate their flaws, we introduce sememe-based semantic knowledge information to speech recognition (SememeASR). Sememe, according to the linguistic definition, is the minimum semantic unit in a language and is able to represent the implicit semantic information behind each word very well. Our experiments show that the introduction of sememe information can improve the effectiveness of speech recognition. In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data and enhance the model's domain generalization ability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques