Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via   Vocal Imitation

Matthew Caren; Kartik Chandra; Joshua B. Tenenbaum; Jonathan; Ragan-Kelley; Karima Ma

arXiv:2409.13507·cs.GR·September 23, 2024

Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via Vocal Imitation

Matthew Caren, Kartik Chandra, Joshua B. Tenenbaum, Jonathan, Ragan-Kelley, Karima Ma

PDF

1 Repo

TL;DR

This paper introduces a method for generating human-like vocal imitations of sounds by tuning a simulated vocal tract model, incorporating communicative reasoning to better align with human perception and intuition.

Contribution

It combines perceptual feature matching with a cognitive communication model to improve the realism and intuitiveness of vocal sound sketches.

Findings

01

Adding communicative reasoning improves alignment with human perception.

02

The method outperforms feature-only matching in user studies.

03

Vocal imitation quality benefits from strategic listener modeling.

Abstract

We present a method for automatically producing human-like vocal imitations of sounds: the equivalent of "sketching," but for auditory rather than visual representation. Starting with a simulated model of the human vocal tract, we first try generating vocal imitations by tuning the model's control parameters to make the synthesized vocalization match the target sound in terms of perceptually-salient auditory features. Then, to better match human intuitions, we apply a cognitive theory of communication to take into account how human speakers reason strategically about their listeners. Finally, we show through several experiments and user studies that when we add this type of communicative reasoning to our method, it aligns with human intuitions better than matching auditory features alone does. This observation has broad implications for the study of depiction in computer graphics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

matthewcaren/vocal-imitation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.