SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

Cheng-Hung Hu; Yu-Huai Peng; Junichi Yamagishi; Yu Tsao; Hsin-Min Wang

arXiv:2107.09392·eess.AS·March 28, 2022

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

PDF

1 Repo

TL;DR

SVSNet is an end-to-end neural network that directly assesses speaker voice similarity from raw speech waveforms, outperforming traditional feature-based metrics in voice conversion evaluation.

Contribution

It introduces the first end-to-end model for speaker similarity assessment that uses raw waveforms, eliminating the need for hand-crafted features.

Findings

01

SVSNet outperforms baseline systems on VCC2018 and VCC2020 datasets.

02

It effectively assesses speaker similarity at both utterance and system levels.

03

The model demonstrates superior correlation with human judgments.

Abstract

Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention. In this paper, we propose SVSNet, the first end-to-end neural network model to assess the speaker voice similarity between converted speech and natural speech for voice conversion tasks. Unlike most neural evaluation metrics that use hand-crafted features, SVSNet directly takes the raw waveform as input to more completely utilize speech information for prediction. SVSNet consists of encoder, co-attention, distance calculation, and prediction modules and is trained in an end-to-end manner. The experimental results on the Voice Conversion Challenge 2018 and 2020 (VCC2018 and VCC2020) datasets show that SVSNet outperforms well-known baseline systems in the assessment of speaker similarity at the utterance and system levels.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

n1243645679976/svsnet
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.