Visual onoma-to-wave: environmental sound synthesis from visual   onomatopoeias and sound-source images

Hien Ohnaka; Shinnosuke Takamichi; Keisuke Imoto; Yuki Okamoto; Kazuki; Fujii; Hiroshi Saruwatari

arXiv:2210.09173·cs.SD·October 18, 2022

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

Hien Ohnaka, Shinnosuke Takamichi, Keisuke Imoto, Yuki Okamoto, Kazuki, Fujii, Hiroshi Saruwatari

PDF

Open Access

TL;DR

This paper introduces a novel method for synthesizing environmental sounds from visual onomatopoeias and sound-source images, leveraging visual text representations to produce diverse and realistic sounds.

Contribution

It proposes a new approach called visual onoma-to-wave that uses visual onomatopoeias and images, along with data augmentation, to improve environmental sound synthesis.

Findings

01

Effective synthesis of diverse environmental sounds from visual inputs.

02

Visual onomatopoeias contain rich information for sound diversity.

03

Data augmentation enhances synthesis performance.

Abstract

We propose a method for synthesizing environmental sounds from visually represented onomatopoeias and sound sources. An onomatopoeia is a word that imitates a sound structure, i.e., the text representation of sound. From this perspective, onoma-to-wave has been proposed to synthesize environmental sounds from the desired onomatopoeia texts. Onomatopoeias have another representation: visual-text representations of sounds in comics, advertisements, and virtual reality. A visual onomatopoeia (visual text of onomatopoeia) contains rich information that is not present in the text, such as a long-short duration of the image, so the use of this representation is expected to synthesize diverse sounds. Therefore, we propose visual onoma-to-wave for environmental sound synthesis from visual onomatopoeia. The method can transfer visual concepts of the visual text and sound-source image to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Digital Storytelling and Education · Subtitles and Audiovisual Media