Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication
Yuanyuan Bao, Yanze Xu, Na Xu, Wenjing Yang, Hongfeng Li, Shicong Li,, Yongtao Jia, Fei Xiang, Jincheng He, Ming Li

TL;DR
This paper introduces a lightweight dual-channel model, LSTM-Former, for target speaker separation on mobile devices, using a new dual-channel dataset, LibriPhone, to improve performance in real-world scenarios.
Contribution
The paper presents a novel dual-channel dataset, LibriPhone, and a lightweight LSTM-Former model optimized for mobile target speaker separation tasks.
Findings
Dual-channel LSTM-Former outperforms single-channel by 25%
LibriPhone dataset mimics real-world mobile scenarios
Lightweight model suitable for mobile deployment
Abstract
Nowadays, there is a strong need to deploy the target speaker separation (TSS) model on mobile devices with a limitation of the model size and computational complexity. To better perform TSS for mobile voice communication, we first make a dual-channel dataset based on a specific scenario, LibriPhone. Specifically, to better mimic the real-case scenario, instead of simulating from the single-channel dataset, LibriPhone is made by simultaneously replaying pairs of utterances from LibriSpeech by two professional artificial heads and recording by two built-in microphones of the mobile. Then, we propose a lightweight time-frequency domain separation model, LSTM-Former, which is based on the LSTM framework with source-to-noise ratio (SI-SNR) loss. For the experiments on Libri-Phone, we explore the dual-channel LSTMFormer model and a single-channel version by a random single channel of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
