Environmental Sound Extraction Using Onomatopoeic Words

Yuki Okamoto; Shota Horiguchi; Masaaki Yamamoto; Keisuke Imoto; Yohei; Kawaguchi

arXiv:2112.00209·cs.SD·February 18, 2022

Environmental Sound Extraction Using Onomatopoeic Words

Yuki Okamoto, Shota Horiguchi, Masaaki Yamamoto, Keisuke Imoto, Yohei, Kawaguchi

PDF

Open Access

TL;DR

This paper introduces a novel method for extracting specific environmental sounds from audio mixtures by leveraging onomatopoeic words to specify the target sound, utilizing a U-Net based approach for improved accuracy.

Contribution

The paper presents a new sound extraction technique that uses onomatopoeic words to specify targets, outperforming traditional sound-event class methods.

Findings

01

Effective extraction of target sounds using onomatopoeic words

02

Outperforms conventional sound-event class-based methods

03

Demonstrates the feasibility of linguistic cues in sound separation

Abstract

An onomatopoeic word, which is a character sequence that phonetically imitates a sound, is effective in expressing characteristics of sound such as duration, pitch, and timbre. We propose an environmental-sound-extraction method using onomatopoeic words to specify the target sound to be extracted. By this method, we estimate a time-frequency mask from an input mixture spectrogram and an onomatopoeic word using a U-Net architecture, then extract the corresponding target sound by masking the spectrogram. Experimental results indicate that the proposed method can extract only the target sound corresponding to the onomatopoeic word and performs better than conventional methods that use sound-event classes to specify the target sound.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net