CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition   Challenge

Chen Chen; Zehua Liu; Xiaolou Li; Lantian Li; Dong Wang

arXiv:2406.10313·cs.CL·June 18, 2024

CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge

Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang

PDF

Open Access

TL;DR

The CNVSRC 2023 challenge evaluated large vocabulary continuous visual speech recognition for Chinese, demonstrating significant improvements over baselines in single- and multi-speaker tasks and reviewing effective techniques.

Contribution

First Chinese continuous visual speech recognition challenge providing comprehensive evaluation and analysis of techniques for single- and multi-speaker tasks.

Findings

01

Best system outperformed baseline significantly

02

Single-speaker task achieved higher accuracy

03

Effective techniques identified for visual speech recognition

Abstract

The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR for a set of registered speakers. The challenge yielded highly successful results, with the best submission significantly outperforming the baseline, particularly in the single-speaker task. This paper comprehensively reviews the challenge, encompassing the data profile, task specifications, and baseline system construction. It also summarises the representative techniques employed by the submitted systems, highlighting the most effective approaches. Additional information and resources about this challenge can be accessed through the official website at http://cnceleb.org/competition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training