Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car   Commands

Wenliang Dai; Samuel Cahyawijaya; Tiezheng Yu; Elham J Barezi; Pascale; Fung

arXiv:2207.02663·cs.CL·July 7, 2022

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale, Fung

PDF

Open Access

TL;DR

This paper introduces a new Cantonese audio-visual dataset and challenge for in-car speech recognition, addressing the scarcity of resources for low-resource languages in automotive AI applications.

Contribution

It provides a novel Cantonese in-car audio-visual dataset and establishes a new benchmark challenge to promote research in low-resource speech recognition in automotive environments.

Findings

01

New Cantonese in-car audio-visual dataset released

02

Benchmark challenge established for low-resource speech recognition

03

Encourages development of speech recognition models for underrepresented languages

Abstract

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major languages, such as English and Chinese. There is a huge data scarcity issue for low-resource languages, hindering the development of research and applications for broader communities. Therefore, it is crucial to have more benchmarks to raise awareness and motivate the research in low-resource languages. To mitigate this problem, we collect a new dataset, namely Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car speech recognition in the Cantonese language with video and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Subtitles and Audiovisual Media