TF-MLPNet: Tiny Real-Time Neural Speech Separation
Malek Itani, Tuochao Chen, Shyamnath Gollakota

TL;DR
TF-MLPNet is a novel neural speech separation model designed for real-time operation on low-power hearable devices, outperforming existing models in efficiency and effectiveness.
Contribution
The paper introduces TF-MLPNet, the first neural speech separation network capable of real-time processing on tiny, low-power accelerators with superior performance.
Findings
Operates in real-time on GAP9 processor processing 6 ms audio chunks.
Achieves 3.5-4x faster runtime than previous models.
Outperforms existing streaming models in blind speech separation and target speech extraction.
Abstract
Speech separation on hearable devices can enable transformative augmented and enhanced hearing capabilities. However, state-of-the-art speech separation networks cannot run in real-time on tiny, low-power neural accelerators designed for hearables, due to their limited compute capabilities. We present TF-MLPNet, the first speech separation network capable of running in real-time on such low-power accelerators while outperforming existing streaming models for blind speech separation and target speech extraction. Our network operates in the time-frequency domain, processing frequency sequences with stacks of fully connected layers that alternate along the channel and frequency dimensions, and independently processing the time sequence at each frequency bin using convolutional layers. Results show that our mixed-precision quantization-aware trained (QAT) model can process 6 ms audio chunks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
