DNN No-Reference PSTN Speech Quality Prediction
Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow,, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner

TL;DR
This paper introduces a new open-source PSTN speech quality dataset and a no-reference DNN model that outperforms existing standards, enabling more reliable live monitoring of PSTN speech quality.
Contribution
The paper presents a novel open-source PSTN speech quality dataset and a DNN-based no-reference model that surpasses current standards like POLQA and P.563.
Findings
The proposed model outperforms POLQA and P.563 on validation and test sets.
File cropping and the number of ratings influence model accuracy.
The dataset enables better generalization across different PSTN networks.
Abstract
Classic public switched telephone networks (PSTN) are often a black box for VoIP network providers, as they have no access to performance indicators, such as delay or packet loss. Only the degraded output speech signal can be used to monitor the speech quality of these networks. However, the current state-of-the-art speech quality models are not reliable enough to be used for live monitoring. One of the reasons for this is that PSTN distortions can be unique depending on the provider and country, which makes it difficult to train a model that generalizes well for different PSTN networks. In this paper, we present a new open-source PSTN speech quality test set with over 1000 crowdsourced real phone calls. Our proposed no-reference model outperforms the full-reference POLQA and no-reference P.563 on the validation and test set. Further, we analyzed the influence of file cropping on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
