Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Cheng Yu; Kuo-Hsuan Hung; Syu-Siang Wang; Szu-Wei Fu; Yu Tsao,; Jeih-weih Hung

arXiv:1911.09847·eess.AS·August 25, 2020

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao,, Jeih-weih Hung

PDF

TL;DR

This paper introduces a time-domain multi-modal speech enhancement system combining bone- and air-conducted signals, demonstrating significant performance improvements over single-source methods using deep learning and ensemble strategies.

Contribution

It presents a novel multi-modal SE framework utilizing bone- and air-conducted signals with ensemble fusion strategies, advancing speech enhancement techniques.

Findings

01

Multi-modal SE outperforms single-source SE in various metrics.

02

Late fusion strategy yields better results than early fusion.

03

The proposed method improves speech quality in Mandarin corpus experiments.

Abstract

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while manifesting speech-phoneme structures, and thus complements its air-conducted counterpart. In this study, we propose a novel multi-modal SE structure in the time domain that leverages bone- and air-conducted signals. In addition, we examine two ensemble-learning-based strategies, early fusion (EF) and late fusion (LF), to integrate the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results on the Mandarin corpus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.