Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance   Learning for Weakly Supervised Sequence Learning Tasks

Yun Wang; Juncheng Li; Florian Metze

arXiv:1804.01146·cs.SD·April 5, 2018·1 cites

Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks

Yun Wang, Juncheng Li, Florian Metze

PDF

Open Access

TL;DR

This paper compares max and noisy-or pooling functions within multiple instance learning for weakly supervised sequence tasks, revealing max pooling's superior ability to localize events in speech and sound detection.

Contribution

It provides a theoretical explanation for the differing behaviors of max and noisy-or pooling functions in sequence learning tasks.

Findings

01

Max pooling effectively localizes phonemes and sound events.

02

Noisy-or pooling fails to localize events.

03

Theoretical analysis explains the differences in pooling functions' performance.

Abstract

Many sequence learning tasks require the localization of certain events in sequences. Because it can be expensive to obtain strong labeling that specifies the starting and ending times of the events, modern systems are often trained with weak labeling without explicit timing information. Multiple instance learning (MIL) is a popular framework for learning from weak labeling. In a common scenario of MIL, it is necessary to choose a pooling function to aggregate the predictions for the individual steps of the sequences. In this paper, we compare the "max" and "noisy-or" pooling functions on a speech recognition task and a sound event detection task. We find that max pooling is able to localize phonemes and sound events, while noisy-or pooling fails. We provide a theoretical explanation of the different behavior of the two pooling functions on sequence learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis