SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit
Wen-Chin Huang, Erica Cooper, Tomoki Toda

TL;DR
SHEET is an open-source toolkit that uses deep neural networks to predict human-assessed speech quality, supporting multiple datasets and models to facilitate research in subjective speech quality evaluation.
Contribution
It introduces a comprehensive, multi-purpose toolkit with pre-trained models and evaluation scripts for advancing speech human evaluation estimation research.
Findings
Re-evaluated SSL-MOS on multiple datasets
Identified a superior SSL model for speech quality prediction
Achieved performance comparable to state-of-the-art methods
Abstract
We introduce SHEET, a multi-purpose open-source toolkit designed to accelerate subjective speech quality assessment (SSQA) research. SHEET stands for the Speech Human Evaluation Estimation Toolkit, which focuses on data-driven deep neural network-based models trained to predict human-labeled quality scores of speech samples. SHEET provides comprehensive training and evaluation scripts, multi-dataset and multi-model support, as well as pre-trained models accessible via Torch Hub and HuggingFace Spaces. To demonstrate its capabilities, we re-evaluated SSL-MOS, a speech self-supervised learning (SSL)-based SSQA model widely used in recent scientific papers, on an extensive list of speech SSL models. Experiments were conducted on two representative SSQA datasets named BVCC and NISQA, and we identified the optimal speech SSL model, whose performance surpassed the original SSL-MOS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
