Improving Speech Recognition Accuracy of Local POI Using Geographical   Models

Songjun Cao; Yike Zhang; Xiaobing Feng; Long Ma

arXiv:2107.03165·eess.AS·July 8, 2021

Improving Speech Recognition Accuracy of Local POI Using Geographical Models

Songjun Cao, Yike Zhang, Xiaobing Feng, Long Ma

PDF

Open Access

TL;DR

This paper enhances local POI speech recognition by introducing a geographic acoustic model and geo-specific language models, effectively addressing dialect diversity and long-tail POI recognition challenges.

Contribution

It proposes a novel geographic acoustic model and geo-specific language models that adapt to user location, significantly improving recognition accuracy for local POI speech recognition.

Findings

01

Geo-AM reduces CER by 6.5% to 10.1% on accent testset.

02

Combined Geo-AM and Geo-LMs achieve over 18.7% CER reduction on Tencent Map.

03

System effectively handles multi-dialect and long-tail POI recognition.

Abstract

Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI has remained to be a challenge due to multi-dialect and massive POI. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The Geo-AM deals with multi-dialect problem using dialect-specific input feature and dialect-specific top layer. Secondly, a group of geo-specific language models (Geo-LMs) are integrated into our speech recognition system to improve recognition accuracy of long tail and homophone POI. During decoding, specific language models are selected on demand according to users' geographic location. Experiments show that the proposed Geo-AM achieves 6.5% $\sim$ 10.1% relative character error rate (CER) reduction on an accent testset and the proposed Geo-AM and Geo-LM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing