ALCAP: Alignment-Augmented Music Captioner

Zihao He; Weituo Hao; Wei-Tsung Lu; Changyou Chen; Kristina Lerman,; Xuchen Song

arXiv:2212.10901·cs.SD·October 24, 2023·1 cites

ALCAP: Alignment-Augmented Music Captioner

Zihao He, Weituo Hao, Wei-Tsung Lu, Changyou Chen, Kristina Lerman,, Xuchen Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces ALCAP, a novel method that uses contrastive learning to align audio and lyrics in music captioning, leading to more coherent and high-quality descriptions, and achieves state-of-the-art results.

Contribution

The paper presents a new contrastive learning approach for multimodal alignment of audio and lyrics in music captioning, enhancing cross-modal coherence.

Findings

01

Achieves state-of-the-art performance on two datasets.

02

Demonstrates the effectiveness of multimodal alignment.

03

Provides theoretical and empirical validation.

Abstract

Music captioning has gained significant attention in the wake of the rising prominence of streaming media platforms. Traditional approaches often prioritize either the audio or lyrics aspect of the music, inadvertently ignoring the intricate interplay between the two. However, a comprehensive understanding of music necessitates the integration of both these elements. In this study, we delve into this overlooked realm by introducing a method to systematically learn multimodal alignment between audio and lyrics through contrastive learning. This not only recognizes and emphasizes the synergy between audio and lyrics but also paves the way for models to achieve deeper cross-modal coherence, thereby producing high-quality captions. We provide both theoretical and empirical results demonstrating the advantage of the proposed method, which achieves new state-of-the-art on two music captioning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zihaohe123/alcap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies