Quality assessment of voice converted speech using articulatory features
Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki, Hemant A. Patil

TL;DR
This paper introduces a novel method using articulatory features derived from acoustic-to-articulatory inversion to objectively assess the quality of voice converted speech, focusing on naturalness and intelligibility.
Contribution
It presents a new approach to quantify speech production information loss in voice conversion using articulatory features, validated against subjective quality scores.
Findings
Increased RMSE error indicates loss of articulatory information in voice conversion.
Decreased mutual information correlates with perceived quality degradation.
Articulatory features outperform MCD in correlating with Mean Opinion Score.
Abstract
We propose a novel application based on acoustic-to-articulatory inversion towards quality assessment of voice converted speech. The ability of humans to speak effortlessly requires coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards naturalness, intelligibility and speakers identity which is partially present in voice converted speech. Hence, during voice conversion, the information related to speech production is lost. In this paper, this loss is quantified for male voice, by showing increase in RMSE error for voice converted speech followed by showing decrease in mutual information. Similar results are obtained in case of female voice. This observation is extended by showing that articulatory features can be used as an objective measure. The effectiveness of proposed measure over MCD is illustrated by comparing their correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
