Optimizing Byte-level Representation for End-to-end ASR
Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan, Zhuang

TL;DR
This paper introduces a method to optimize byte-level representations for end-to-end ASR using auto-encoder and vector quantization, resulting in improved accuracy over standard UTF-8 encoding in multilingual speech recognition.
Contribution
It presents a novel framework that optimizes byte-level representations specifically for ASR, incorporating multimodal information and error correction, outperforming traditional UTF-8 encoding.
Findings
Achieved 5% relative error rate reduction in English/Mandarin ASR
Demonstrated effectiveness of optimized byte representations over UTF-8
Proposed a versatile framework integrating multiple modalities
Abstract
We propose a novel approach to optimizing a byte-level representation for end-to-end automatic speech recognition (ASR). Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages is large. The compactness and universality of byte-level representation allow the ASR models to use smaller output vocabularies and therefore, provide more flexibility. UTF-8 is a commonly used byte-level representation for multilingual ASR, but it is not designed to optimize machine learning tasks directly. By using auto-encoder and vector quantization, we show that we can optimize a byte-level representation for ASR and achieve better accuracy. Our proposed framework can incorporate information from different modalities, and provides an error correction mechanism. In an English/Mandarin dictation task, we show that a bilingual ASR model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
MethodsSparse Evolutionary Training
