CSLNSpeech: solving extended speech separation problem with the help of Chinese sign language
Jiasong Wu, Xuan Li, Taotao Li, Fanman Meng, Youyong Kong, Guanyu, Yang, Lotfi Senhadji, Huazhong Shu

TL;DR
This paper introduces CSLNSpeech, a deep learning framework that leverages audio, face, and sign language modalities to improve speech separation, especially aiding hearing-impaired individuals, and demonstrates its effectiveness on a new large-scale dataset and existing benchmarks.
Contribution
The paper presents a novel multi-modal speech separation model incorporating sign language, along with a large-scale Chinese Sign Language News Speech dataset for training and evaluation.
Findings
The model outperforms traditional audio-visual systems in accuracy and robustness.
Sign language alone can effectively supervise speech separation.
The framework achieves competitive results on multiple datasets.
Abstract
Previous audio-visual speech separation methods use the synchronization of the speaker's facial movement and speech in the video to supervise the speech separation in a self-supervised way. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network for learning the combination of three modalities, audio, face, and sign language information, for better solving the speech separation problem. To train the model, we introduce a large-scale dataset named the Chinese Sign Language News Speech (CSLNSpeech) dataset, in which three modalities of audio, face, and sign language coexist. Experiment results show that the proposed model has better performance and robustness than the usual audio-visual system. Besides, sign language modality can also be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing
MethodsConcatenated Skip Connection · Max Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · U-Net
