Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character   Pre-training in LLMs

Yang Yuhang; Peng Yizhou; Eng Siong Chng; Xionghu Zhong

arXiv:2409.16005·cs.CL·September 25, 2024

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs

Yang Yuhang, Peng Yizhou, Eng Siong Chng, Xionghu Zhong

PDF

Open Access

TL;DR

This paper introduces a novel Pinyin-to-Character pre-training method for large language models to improve Chinese speech recognition, achieving significant relative performance gains on the AISHELL-1 corpus.

Contribution

It proposes a new pre-training approach using Pinyin embeddings and fine-tuning with LoRA to enhance LLMs for ASR tasks involving Chinese speech.

Findings

01

9.5% relative improvement on AISHELL-1

02

19.0% relative improvement with auxiliary data

03

Effective integration of pronunciation features into LLMs

Abstract

The integration of large language models (LLMs) with pre-trained speech models has opened up new avenues in automatic speech recognition (ASR). While LLMs excel in multimodal understanding tasks, effectively leveraging their capabilities for ASR remains a significant challenge. This paper presents a novel training approach to enhance LLM performance in ASR tasks. We propose pre-training LLMs on Pinyin embedding sequences, which represent pronunciation features, to generate corresponding Chinese characters. This step enables the LLM to adapt to generating text from pronunciation features before encountering real speech data. Furthermore, we fine-tune the LoRA parameters to enhance the LLM's understanding of speech modality information. In AISHELL-1 corpus, our approach yields a 9.5% relative improvement in ASR tasks compared to the baseline without Pinyi-to-Character pre-training.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Second Language Acquisition and Learning