CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile   Application

Yu-Wen Chen; Kuo-Hsuan Hung; You-Jin Li; Alexander Chao-Fu Kang,; Ya-Hsin Lai; Kai-Chun Liu; Szu-Wei Fu; Syu-Siang Wang; Yu Tsao

arXiv:2008.09264·eess.AS·April 26, 2022

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

Yu-Wen Chen, Kuo-Hsuan Hung, You-Jin Li, Alexander Chao-Fu Kang,, Ya-Hsin Lai, Kai-Chun Liu, Szu-Wei Fu, Syu-Siang Wang, Yu Tsao

PDF

1 Repo

TL;DR

CITISEN is a mobile app utilizing deep learning for speech enhancement, model adaptation, and background noise conversion, demonstrating significant improvements in speech quality and intelligibility in noisy environments.

Contribution

The paper introduces CITISEN, a novel mobile application that integrates multiple deep learning functions for speech processing, including a unique background noise conversion feature for data augmentation.

Findings

01

Enhanced speech signals showed 6% and 33% improvements in STOI and PESQ.

02

Model adaptation further improved speech quality by approximately 6% and 11%.

03

Background noise conversion maintained scene classification accuracy, aiding data augmentation.

Abstract

This study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users. For SE, a pretrained SE model downloaded from the cloud server is used to effectively reduce noise components from instant or saved recordings provided by users. For encountering unseen noise or speaker environments, the MA function is applied to promote CITISEN. A few audio samples recording on a noisy environment are uploaded and used to adapt the pretrained SE model on the server. Finally, for BNC, CITISEN first removes the background noises through an SE model and then mixes the processed speech with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuwchen/CITISEN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.