Onssen: an open-source speech separation and enhancement library
Zhaoheng Ni, Michael I Mandel

TL;DR
Onssen is an open-source library that facilitates speech separation and enhancement research by providing implementations of various deep learning algorithms, supporting customized datasets, and enabling easy comparison of methods.
Contribution
It introduces a comprehensive, open-source platform for speech separation that supports multiple algorithms and datasets, promoting reproducibility and benchmarking in the field.
Findings
Algorithms in onssen achieve reported performances
Supports most time-frequency mask-based methods
Enables easy comparison across algorithms and datasets
Abstract
Speech separation is an essential task for multi-talker speech recognition. Recently many deep learning approaches are proposed and have been constantly refreshing the state-of-the-art performances. The lack of algorithm implementations limits researchers to use the same dataset for comparison. Building a generic platform can benefit researchers by easily implementing novel separation algorithms and comparing them with the existing ones on customized datasets. We introduce "onssen": an open-source speech separation and enhancement library. onssen is a library mainly for deep learning separation and enhancement algorithms. It uses LibRosa and NumPy libraries for the feature extraction and PyTorch as the back-end for model training. onssen supports most of the Time-Frequency mask-based separation algorithms (e.g. deep clustering, chimera net, chimera++, and so on) and also supports…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
