Increasing Information Extraction in Low-Signal Regimes via Multiple Instance Learning
Atakan Azakli, Bernd Stelzer

TL;DR
This paper presents a new information-theoretic approach to Multiple Instance Learning (MIL), demonstrating its advantage over single-instance methods in low-signal regimes, with applications to particle physics data from the LHC.
Contribution
It introduces an information-theoretic perspective on MIL for parameter estimation and shows MIL's superiority in low-signal scenarios compared to single-instance models.
Findings
MIL can outperform single-instance learners in low-signal regimes.
Pooling instances increases effective Fisher information.
Application to SMEFT parameter constraints at the LHC.
Abstract
In this work, we introduce a new information-theoretic perspective on Multiple Instance Learning (MIL) for parameter estimation with i.i.d. data, and show that MIL can outperform single-instance learners in low-signal regimes. Prior work [Nachman and Thaler, 2021] argued that single-instance methods are often sufficient, but this conclusion presumes enough single-instance signal to train near-optimal classifiers. We demonstrate that even state-of-the-art single-instance models can fail to reach optimal classifier performance in challenging low-signal regimes, whereas MIL can mitigate this sub-optimality. As a concrete application, we constrain Wilson coefficients of the Standard Model Effective Field Theory (SMEFT) using kinematic information from subatomic particle collision events at the Large Hadron Collider (LHC). In experiments, we observe that under specific modeling and weak…
Peer Reviews
Decision·Submitted to ICLR 2026
* Investigation of multiple instance learning in a new setting (low signal-to-noise ratios), which is of practical importance at the CERN LHC. * Theoretical justification using effective Fisher information to explain why MIL practically improves on SIL for low SNRs. * Comparison to multiple baselines, including parameterized neural networks, and in multiple settings, including binary and multi-class classification. * Code is made public.
* Unclear if the data has been or will be made public. * Some details on the training procedures are missing, e.g. how large is the training data set? How many epochs was each algorithm trained for? Were training hyperparameters, e.g., learning rate, optimized? Etc. Since part of the claims depend on (non)optimality of the models, these are important considerations.
I am not familiar with physics literature and hence cannot assess the significance of the paper within this field. From a general statistics perspective, especially in the context of data fusion, the contribution of the paper is a lightweight method, based on LRT, which fuses multiple instances at a feature level rather than at a decision level. Feature-level fusion is known to be superior to decision-level fusion, especially in low-SNR regimes, but is generally considered a complex task.
I wonder how novel or substantial contribution is. As already mentioned, the fact that feature-level fusion is superior to the decision level is well-known and intuitive. The use of NNs for estimating likelihood ratios is not entirely new and is extensively discussed in the context of neural ratio estimation (NRE). From the perspective of MIL, the paper considers a simplified scenario, which to me is a repetition of the standard point estimation theory with multiple observations. A major part of
The paper is well-written and the information-theoretic motivation is sound and clearly described in the Reviewers' opinion. The empirical demonstration of performance degradation in single-instance models versus the MIL approach in the presence of background contamination is also a well-executed and easy to follow experiment. Additionally also the discovery that models violate the second Bartlett identity and the suggested fix are interesting results by itself.
- The paper focuses on a single application only. While the application problem seems important, given that the authors introduce a general framework, the Reviewer would expect at least one other examples to show the applicability of the approach. Especially as the authors claim a general-purpose framework, this should also be reflected in the experiment section. - The technical contribution of the paper seems to be limited, the core architecture is a simple MLP, and the multiple instance learni
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Machine Learning and Data Classification · AI in cancer detection
