Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning
Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes

TL;DR
This paper introduces a synthetic classification task to evaluate how well multiple instance learning methods capture contextual relationships, revealing significant generalization gaps in existing approaches.
Contribution
It designs a synthetic MIL benchmark emphasizing the importance of instance correlation and demonstrates the limitations of current methods in capturing these relationships.
Findings
Off-the-shelf MIL methods underperform compared to the optimal Bayes estimator.
Newer correlated MIL methods still do not reach optimal performance with large training data.
Contextual relationships are crucial for accurate MIL in medical imaging.
Abstract
Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, conventional MIL approaches treat instances separately, ignoring contextual relationships such as the appearance of nearby patches or slices that can be essential in real applications. We design a synthetic classification task where accounting for adjacent instance features is crucial for accurate prediction. We demonstrate the limitations of off-the-shelf MIL approaches by quantifying their performance compared to the optimal Bayes estimator for this task, which is available in closed-form. We empirically show that newer correlated MIL methods still do not achieve the best possible performance when trained with ten thousand training samples, each containing many instances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
