Multi-view Audio and Music Classification
Huy Phan, Huy Le Nguyen, Oliver Y. Ch\'en, Lam Pham, Philipp Koch, Ian, McLoughlin, Alfred Mertins

TL;DR
This paper introduces a multi-view learning framework for audio and music classification that combines multiple low-level representations, with adaptive weighting to improve generalization and outperform existing methods.
Contribution
It presents a novel multi-view network with adaptive gradient blending for audio and music classification, enhancing performance over traditional single-view and multi-view baselines.
Findings
Outperforms single-view baselines
Superior to concatenation and late fusion multi-view methods
Effective adaptive weighting improves learning behavior
Abstract
We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view embedding for classification similar to a simple concatenation network. However, apart from the joint classification branch, the network also maintains four classification branches on the single-view embedding of the subnetworks. A novel method is then proposed to keep track of the learning behavior on the classification branches and adapt their weights to proportionally blend their gradients for network training. The weights are adapted in such a way that learning on a branch that is generalizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
