Automated Audio Captioning and Language-Based Audio Retrieval

Clive Gomes; Hyejin Park; Patrick Kollman; Yi Song; Iffanice Houndayi,; Ankit Shah

arXiv:2207.04156·cs.SD·May 16, 2023

Automated Audio Captioning and Language-Based Audio Retrieval

Clive Gomes, Hyejin Park, Patrick Kollman, Yi Song, Iffanice Houndayi,, Ankit Shah

PDF

Open Access 1 Repo

TL;DR

This paper describes participation in the DCASE 2022 Competition focusing on automated audio captioning and language-based audio retrieval, with experiments modifying baseline models and achieving competitive results.

Contribution

It introduces modified models for audio captioning and retrieval, with the retrieval model surpassing baseline performance in the competition.

Findings

01

Retrieval model outperformed baseline in DCASE 2022.

02

Captioning model achieved performance close to baseline.

03

Experiments demonstrated effectiveness of model modifications.

Abstract

This project involved participation in the DCASE 2022 Competition (Task 6) which had two subtasks: (1) Automated Audio Captioning and (2) Language-Based Audio Retrieval. The first subtask involved the generation of a textual description for audio samples, while the goal of the second was to find audio samples within a fixed dataset that match a given description. For both subtasks, the Clotho dataset was used. The models were evaluated on BLEU1, BLEU2, BLEU3, ROUGEL, METEOR, CIDEr, SPICE, and SPIDEr scores for audio captioning and R1, R5, R10 and mARP10 scores for audio retrieval. We have conducted a handful of experiments that modify the baseline models for these tasks. Our final architecture for Automated Audio Captioning is close to the baseline performance, while our model for Language-Based Audio Retrieval has surpassed its counterpart.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

patrickkollman/deep-learning-final-project
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing