TL;DR
CITISEN is a mobile app utilizing deep learning for speech enhancement, model adaptation, and background noise conversion, demonstrating significant improvements in speech quality and intelligibility in noisy environments.
Contribution
The paper introduces CITISEN, a novel mobile application that integrates multiple deep learning functions for speech processing, including a unique background noise conversion feature for data augmentation.
Findings
Enhanced speech signals showed 6% and 33% improvements in STOI and PESQ.
Model adaptation further improved speech quality by approximately 6% and 11%.
Background noise conversion maintained scene classification accuracy, aiding data augmentation.
Abstract
This study presents a deep learning-based speech signal-processing mobile application known as CITISEN. The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users. For SE, a pretrained SE model downloaded from the cloud server is used to effectively reduce noise components from instant or saved recordings provided by users. For encountering unseen noise or speaker environments, the MA function is applied to promote CITISEN. A few audio samples recording on a noisy environment are uploaded and used to adapt the pretrained SE model on the server. Finally, for BNC, CITISEN first removes the background noises through an SE model and then mixes the processed speech with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
