Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests
Edward Raff, James Holt

TL;DR
This paper highlights that current deep MIL models often violate core assumptions, risking incorrect learning, and proposes an algorithmic unit test to detect such violations across models.
Contribution
It introduces a model-agnostic algorithmic unit test to identify violations of MIL assumptions in deep learning models.
Findings
All five evaluated models failed the proposed tests.
Models learn anti-correlated instances, violating MIL assumptions.
The test can be used to improve model reliability.
Abstract
Multiple Instance Learning (MIL) is a sub-domain of classification problems with positive and negative labels and a "bag" of inputs, where the label is positive if and only if a positive element is contained within the bag, and otherwise is negative. Training in this context requires associating the bag-wide label to instance-level information, and implicitly contains a causal assumption and asymmetry to the task (i.e., you can't swap the labels without changing the semantics). MIL problems occur in healthcare (one malignant cell indicates cancer), cyber security (one malicious executable makes an infected computer), and many other tasks. In this work, we examine five of the most prominent deep-MIL models and find that none of them respects the standard MIL assumption. They are able to learn anti-correlated instances, i.e., defaulting to "positive" labels until seeing a negative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications
