Coincidence, Categorization, and Consolidation: Learning to Recognize   Sounds with Minimal Supervision

Aren Jansen; Daniel P. W. Ellis; Shawn Hershey; R. Channing Moore,; Manoj Plakal; Ashok C. Popat; Rif A. Saurous

arXiv:1911.05894·cs.SD·November 15, 2019

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

Aren Jansen, Daniel P. W. Ellis, Shawn Hershey, R. Channing Moore,, Manoj Plakal, Ashok C. Popat, Rif A. Saurous

PDF

TL;DR

This paper introduces a novel sound recognition framework inspired by human learning, combining self-supervised, clustering, and active learning techniques to achieve state-of-the-art unsupervised audio representations with minimal supervision.

Contribution

It presents a new integrated learning approach that combines coincidence-based self-supervision, clustering, and targeted active learning for sound recognition.

Findings

01

Achieves state-of-the-art unsupervised audio representations.

02

Reduces label requirements by up to 20 times.

03

Demonstrates effective category consolidation with minimal supervision.

Abstract

Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on multimodal unsupervised learning (as infants) and active learning (as children). With this motivation, we present a learning framework for sound representation and recognition that combines (i) a self-supervised objective based on a general notion of unimodal and cross-modal coincidence, (ii) a clustering objective that reflects our need to impose categorical structure on our experiences, and (iii) a cluster-based active learning procedure that solicits targeted weak supervision to consolidate categories into relevant semantic classes. By training a combined sound embedding/clustering/classification network according to these criteria, we achieve a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.