Language-based Audio Retrieval Task in DCASE 2022 Challenge

Huang Xie; Samuel Lipping; Tuomas Virtanen

arXiv:2206.06108·eess.AS·October 6, 2022

Language-based Audio Retrieval Task in DCASE 2022 Challenge

Huang Xie, Samuel Lipping, Tuomas Virtanen

PDF

Open Access 1 Repo

TL;DR

This paper discusses the development and evaluation of systems for language-based audio retrieval in the DCASE 2022 Challenge, focusing on ranking audio signals based on natural language queries.

Contribution

It introduces the task of language-based audio retrieval in DCASE 2022 and analyzes the performance of submitted systems, outperforming the baseline significantly.

Findings

01

Top system achieved 0.276 mAP@10

02

Significant improvement over baseline systems

03

Analysis of system performance and approaches

Abstract

Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which aims at developing computational systems to model relationships between audio signals and free-form textual descriptions. Compared with audio captioning (Subtask 6A), which is about generating audio captions for audio signals, language-based audio retrieval (Subtask 6B) focuses on ranking audio signals according to their relevance to natural language textual captions. In DCASE 2022 Challenge, the provided baseline system for Subtask 6B was significantly outperformed, with top performance being 0.276 in mAP@10. This paper presents the outcome of Subtask 6B in terms of submitted systems' performance and analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xieh97/dcase2022-audio-retrieval
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Diverse Musicological Studies