Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
Ilaria Manco, Justin Salamon, Oriol Nieto

TL;DR
This paper investigates key design choices in music-text contrastive learning for audio models, highlighting data curation's importance and introducing techniques to enhance training diversity and performance efficiently.
Contribution
It identifies the impact of base encoders, data curation, and text augmentation on model quality, and proposes Augmented View Dropout and TextSwap to improve diversity without extra costs.
Findings
Data curation is crucial in resource-limited settings.
Proposed techniques boost performance across models and datasets.
Methods do not increase computational costs or data requirements.
Abstract
Audio-text contrastive models have become a powerful approach in music representation learning. Despite their empirical success, however, little is known about the influence of key design choices on the quality of music-text representations learnt through this framework. In this work, we expose these design choices within the constraints of limited data and computation budgets, and establish a more solid understanding of their impact grounded in empirical observations along three axes: the choice of base encoders, the level of curation in training data, and the use of text augmentation. We find that data curation is the single most important factor for music-text contrastive training in resource-constrained scenarios. Motivated by this insight, we introduce two novel techniques, Augmented View Dropout and TextSwap, which increase the diversity and descriptiveness of text inputs seen in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Natural Language Processing Techniques · Cancer-related molecular mechanisms research
MethodsDropout · Balanced Selection
