Multi-view Audio and Music Classification

Huy Phan; Huy Le Nguyen; Oliver Y. Ch\'en; Lam Pham; Philipp Koch; Ian; McLoughlin; Alfred Mertins

arXiv:2103.02420·cs.SD·March 4, 2021

Multi-view Audio and Music Classification

Huy Phan, Huy Le Nguyen, Oliver Y. Ch\'en, Lam Pham, Philipp Koch, Ian, McLoughlin, Alfred Mertins

PDF

TL;DR

This paper introduces a multi-view learning framework for audio and music classification that combines multiple low-level representations, with adaptive weighting to improve generalization and outperform existing methods.

Contribution

It presents a novel multi-view network with adaptive gradient blending for audio and music classification, enhancing performance over traditional single-view and multi-view baselines.

Findings

01

Outperforms single-view baselines

02

Superior to concatenation and late fusion multi-view methods

03

Effective adaptive weighting improves learning behavior

Abstract

We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view embedding for classification similar to a simple concatenation network. However, apart from the joint classification branch, the network also maintains four classification branches on the single-view embedding of the subnetworks. A novel method is then proposed to keep track of the learning behavior on the classification branches and adapt their weights to proportionally blend their gradients for network training. The weights are adapted in such a way that learning on a branch that is generalizing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.