Speak Like a Dog: Human to Non-human creature Voice Conversion
Kohei Suzuki, Shoki Sakamoto, Tadahiro Taniguchi, Hirokazu Kameoka

TL;DR
This paper introduces a novel voice conversion task transforming human speech into dog-like sounds, exploring the feasibility and challenges of non-human voice synthesis while preserving linguistic content.
Contribution
It proposes the first framework for human-to-dog voice conversion using non-parallel VC methods and evaluates different acoustic features, architectures, and training criteria.
Findings
Mel-spectrograms enhance dog-likeness of converted speech
Preserving linguistic information remains challenging
Different VC methods vary in sound quality and intelligibility
Abstract
This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks. Although most VC studies deal with human to human VC, H2NH-VC aims to convert human speech into non-human creature-like speech. Non-parallel VC allows us to develop H2NH-VC, because we cannot collect a parallel dataset that non-human creatures speak human language. In this study, we propose to use dogs as an example of a non-human creature target domain and define the "speak like a dog" task. To clarify the possibilities and characteristics of the "speak like a dog" task, we conducted a comparative experiment using existing representative non-parallel VC methods in acoustic features (Mel-cepstral coefficients and Mel-spectrograms), network architectures (five different kernel-size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
