A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification
Ethan Harvey, Dennis Johan Loevlie, Amir Ali Satani, Wansu Chen, David M. Kent, Michael C. Hughes

TL;DR
This paper systematically compares multiple neural network approaches for 3D neuroimage classification across diverse datasets, highlighting the efficiency and effectiveness of simple mean pooling MIL as a competitive baseline.
Contribution
It provides a comprehensive benchmark of MIL, 3D CNNs, and ViTs for neuroimages, revealing that simple mean pooling MIL often matches or outperforms more complex methods.
Findings
Mean pooling MIL matches or outperforms alternatives on most tasks.
Simple MIL is 25x faster to train than 3D CNNs.
Analysis shows limits of current MIL approaches and suggests future directions.
Abstract
Despite being resource-intensive to train, 3D convolutional neural networks (CNNs) have been the standard approach to classify CT and MRI scans. Recent work suggests that deep multiple instance learning (MIL) may be a more efficient alternative for 3D brain scans, especially when the pre-trained image encoder used to embed each 2D slice is frozen and only the pooling operation and classifier are trained. In this paper, we provide a systematic comparison of simple MIL, attention-based MIL, 3D CNNs, and 3D ViTs across three CT and four MRI datasets, including two large datasets of at least 10,000 scans. Our goal is to help resource-constrained practitioners understand which neural networks work well for 3D neuroimages and why. We further compare design choices for attention-based MIL, including different encoders, pooling operations, and architectural orderings. We find that simple mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
