Speak Like a Dog: Human to Non-human creature Voice Conversion

Kohei Suzuki; Shoki Sakamoto; Tadahiro Taniguchi; Hirokazu Kameoka

arXiv:2206.04780·cs.SD·January 18, 2023

Speak Like a Dog: Human to Non-human creature Voice Conversion

Kohei Suzuki, Shoki Sakamoto, Tadahiro Taniguchi, Hirokazu Kameoka

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel voice conversion task transforming human speech into dog-like sounds, exploring the feasibility and challenges of non-human voice synthesis while preserving linguistic content.

Contribution

It proposes the first framework for human-to-dog voice conversion using non-parallel VC methods and evaluates different acoustic features, architectures, and training criteria.

Findings

01

Mel-spectrograms enhance dog-likeness of converted speech

02

Preserving linguistic information remains challenging

03

Different VC methods vary in sound quality and intelligibility

Abstract

This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks. Although most VC studies deal with human to human VC, H2NH-VC aims to convert human speech into non-human creature-like speech. Non-parallel VC allows us to develop H2NH-VC, because we cannot collect a parallel dataset that non-human creatures speak human language. In this study, we propose to use dogs as an example of a non-human creature target domain and define the "speak like a dog" task. To clarify the possibilities and characteristics of the "speak like a dog" task, we conducted a comparative experiment using existing representative non-parallel VC methods in acoustic features (Mel-cepstral coefficients and Mel-spectrograms), network architectures (five different kernel-size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

suzuki256/dog-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing