TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants
Hsin-Tien Chiang, John H. L. Hansen

TL;DR
TokenSE is a novel speech enhancement framework for cochlear implants that uses a Mamba-based model to predict clean speech tokens, offering improved efficiency and performance in noisy environments.
Contribution
It introduces a Mamba-based discrete token speech enhancement framework that outperforms baselines and is computationally efficient for cochlear implant applications.
Findings
TokenSE outperforms baseline methods on multiple datasets.
Subjective tests show improved speech intelligibility for CI users.
Mamba-based model offers linear complexity, suitable for real-time applications.
Abstract
Speech enhancement (SE) is critical for improving speech intelligibility and quality in real-world environments, particularly for cochlear implant (CI) users who experience severe degradations in speech understanding under noisy and reverberant conditions. In this study, we propose TokenSE, a discrete token-based SE framework operating in the neural audio codec space, which predicts clean codec token indices from degraded speech using a Mamba-based model. Unlike the earlier Transformer architecture, whose self-attention mechanism has a computational complexity that grows quadratically with sequence length, the input-dependent selection mechanism of Mamba achieves linear complexity, making it a compelling alternative to Transformers, especially for CI and hearing-aid (HA) applications. Objective evaluations show that TokenSE consistently outperforms baseline methods on both in-domain and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
