Audio Retrieval with Natural Language Queries: A Benchmark Study

A. Sophia Koepke; Andreea-Maria Oncescu; Jo\~ao F. Henriques; Zeynep; Akata; Samuel Albanie

arXiv:2112.09418·eess.AS·February 11, 2022

Audio Retrieval with Natural Language Queries: A Benchmark Study

A. Sophia Koepke, Andreea-Maria Oncescu, Jo\~ao F. Henriques, Zeynep, Akata, Samuel Albanie

PDF

1 Repo

TL;DR

This paper introduces three new benchmarks for cross-modal text-audio and audio-text retrieval tasks, enabling more effective search of audio content using natural language descriptions, and establishes baseline results demonstrating the benefits of pre-training.

Contribution

The paper presents new challenging benchmarks for text-audio and audio-text retrieval, constructed from existing datasets and a novel dataset, facilitating future research in this area.

Findings

01

Pre-training on diverse audio tasks improves retrieval performance.

02

The benchmarks enable standardized evaluation of text-audio retrieval methods.

03

Baseline results demonstrate the effectiveness of the proposed datasets.

Abstract

The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the goal is to retrieve the audio content from a pool of candidates that best matches a given written description and vice versa. Text-audio retrieval enables users to search large databases through an intuitive interface: they simply issue free-form natural language descriptions of the sound they would like to hear. To study the tasks of text-audio and audio-text retrieval, which have received limited attention in the existing literature, we introduce three challenging new benchmarks. We first construct text-audio and audio-text retrieval benchmarks from the AudioCaps and Clotho audio captioning datasets. Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akoepke/audio-retrieval-benchmark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.