MULTIMODAL ANALYSIS: Informed content estimation and audio source   separation

Gabriel Meseguer-Brocal

arXiv:2104.13276·cs.SD·November 1, 2021

MULTIMODAL ANALYSIS: Informed content estimation and audio source separation

Gabriel Meseguer-Brocal

PDF

Open Access

TL;DR

This paper explores multimodal learning combining audio signals and lyrics to improve music source separation and content estimation, emphasizing the unique connection between singing voice, melody, and lyrics.

Contribution

It introduces a novel approach focusing on the interaction between audio and lyrics for enhanced source separation and content estimation in musical signals.

Findings

01

Improved source separation accuracy using lyrics information

02

Enhanced content estimation through multimodal analysis

03

Demonstrated the effectiveness of combining audio and text data

Abstract

This dissertation proposes the study of multimodal learning in the context of musical signals. Throughout, we focus on the interaction between audio signals and text information. Among the many text sources related to music that can be used (e.g. reviews, metadata, or social network feedback), we concentrate on lyrics. The singing voice directly connects the audio signal and the text information in a unique way, combining melody and lyrics where a linguistic dimension complements the abstraction of musical instruments. Our study focuses on the audio and lyrics interaction for targeting source separation and informed content estimation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis