Introducing Auxiliary Text Query-modifier to Content-based Audio   Retrieval

Daiki Takeuchi; Yasunori Ohishi; Daisuke Niizumi; Noboru Harada; Kunio; Kashino

arXiv:2207.09732·eess.AS·July 21, 2022

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio, Kashino

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel content-based audio retrieval method that incorporates auxiliary textual descriptions to refine search results, enabling retrieval of slightly different target audio beyond traditional similarity constraints.

Contribution

The paper presents a new approach that integrates auxiliary text query-modifiers into audio embeddings, allowing more flexible and accurate retrieval of related audio content.

Findings

01

The method outperforms baseline in retrieval accuracy.

02

Shared latent space effectively encodes audio differences and text.

03

Visualization confirms meaningful embedding alignment.

Abstract

The amount of audio data available on public websites is growing rapidly, and an efficient mechanism for accessing the desired data is necessary. We propose a content-based audio retrieval method that can retrieve a target audio that is similar to but slightly different from the query audio by introducing auxiliary textual information which describes the difference between the query and target audio. While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space. To evaluate our method, we built a dataset comprising two different audio clips and the text that describes the difference. The experimental results show that the proposed method retrieves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nttcslab/apwd-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization