Remixing Music with Visual Conditioning

Li-Chia Yang; Alexander Lerch

arXiv:2010.14565·cs.SD·October 29, 2020

Remixing Music with Visual Conditioning

Li-Chia Yang, Alexander Lerch

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel system for music remixing that uses visual inputs, specifically images, to condition the separation and remixing of audio sources, enhancing quality over traditional methods.

Contribution

It adapts an audio-visual source separation model to work with images instead of videos and develops a remixing engine that improves audio quality in music remixing tasks.

Findings

01

Achieves better audio quality than separate-and-add methods

02

Successfully uses images as visual conditioning for audio source separation

03

Extends audio-visual models to user-selected images for remixing

Abstract

We propose a visually conditioned music remixing system by incorporating deep visual and audio models. The method is based on a state of the art audio-visual source separation model which performs music instrument source separation with video information. We modified the model to work with user-selected images instead of videos as visual input during inference to enable separation of audio-only content. Furthermore, we propose a remixing engine that generalizes the task of source separation into music remixing. The proposed method is able to achieve improved audio quality compared to remixing performed by the separate-and-add method with a state-of-the-art audio-visual source separation model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RichardYang40148/VAreMixer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques