CSLNSpeech: solving extended speech separation problem with the help of   Chinese sign language

Jiasong Wu; Xuan Li; Taotao Li; Fanman Meng; Youyong Kong; Guanyu; Yang; Lotfi Senhadji; Huazhong Shu

arXiv:2007.10629·eess.AS·November 6, 2023

CSLNSpeech: solving extended speech separation problem with the help of Chinese sign language

Jiasong Wu, Xuan Li, Taotao Li, Fanman Meng, Youyong Kong, Guanyu, Yang, Lotfi Senhadji, Huazhong Shu

PDF

Open Access 1 Repo

TL;DR

This paper introduces CSLNSpeech, a deep learning framework that leverages audio, face, and sign language modalities to improve speech separation, especially aiding hearing-impaired individuals, and demonstrates its effectiveness on a new large-scale dataset and existing benchmarks.

Contribution

The paper presents a novel multi-modal speech separation model incorporating sign language, along with a large-scale Chinese Sign Language News Speech dataset for training and evaluation.

Findings

01

The model outperforms traditional audio-visual systems in accuracy and robustness.

02

Sign language alone can effectively supervise speech separation.

03

The framework achieves competitive results on multiple datasets.

Abstract

Previous audio-visual speech separation methods use the synchronization of the speaker's facial movement and speech in the video to supervise the speech separation in a self-supervised way. In this paper, we propose a model to solve the speech separation problem assisted by both face and sign language, which we call the extended speech separation problem. We design a general deep learning network for learning the combination of three modalities, audio, face, and sign language information, for better solving the speech separation problem. To train the model, we introduce a large-scale dataset named the Chinese Sign Language News Speech (CSLNSpeech) dataset, in which three modalities of audio, face, and sign language coexist. Experiment results show that the proposed model has better performance and robustness than the usual audio-visual system. Besides, sign language modality can also be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iveveive/slnspeech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing

MethodsConcatenated Skip Connection · Max Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · U-Net