Alternate Intermediate Conditioning with Syllable-level and   Character-level Targets for Japanese ASR

Yusuke Fujita; Tatsuya Komatsu; Yusuke Kida

arXiv:2204.00175·cs.CL·March 14, 2023·1 cites

Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida

PDF

Open Access

TL;DR

This paper proposes a novel Japanese ASR method that uses intermediate syllable and character predictions to improve recognition accuracy, addressing pronunciation ambiguities inherent in Japanese kanji characters.

Contribution

It introduces an explicit interaction mechanism between characters and syllables using Self-conditioned CTC with intermediate predictions as conditioning features.

Findings

01

Outperformed conventional multi-task methods

02

Improved recognition accuracy on Japanese speech data

03

Effective handling of pronunciation ambiguities

Abstract

End-to-end automatic speech recognition directly maps input speech to characters. However, the mapping can be problematic when several different pronunciations should be mapped into one character or when one pronunciation is shared among many different characters. Japanese ASR suffers the most from such many-to-one and one-to-many mapping problems due to Japanese kanji characters. To alleviate the problems, we introduce explicit interaction between characters and syllables using Self-conditioned connectionist temporal classification (CTC), in which the upper layers are ``self-conditioned'' on the intermediate predictions from the lower layers. The proposed method utilizes character-level and syllable-level intermediate predictions as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Corpus of Spontaneous Japanese show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Phonetics and Phonology Research