TL;DR
This paper introduces a neighbouring datasets framework to understand predictive multiplicity, revealing that greater inter-class overlap reduces multiplicity and proposing new multiplicity-aware methods for active learning and data imputation.
Contribution
The paper presents a novel neighbouring datasets perspective, linking data processing to model multiplicity, and develops new multiplicity-aware algorithms for active learning and data imputation.
Findings
Greater inter-class overlap leads to lower multiplicity.
Systematic study of multiplicity in existing algorithms.
Proposed novel multiplicity-aware data acquisition and imputation methods.
Abstract
Multiplicity, the existence of equally good yet competing models, has received growing attention in recent years. While prior work has emphasized modelling choices, the critical role of data in shaping multiplicity has been largely overlooked. In this work, we first introduce a neighbouring datasets framework, arguing that much of data processing can be reframed as choosing between neighbouring datasets. Under this framework, we find a counterintuitive theoretical relationship: neighbouring datasets with greater inter-class distribution overlap exhibit lower multiplicity. Building on this insight, we apply our framework to two domains: active learning and data imputation. For each, we establish natural extensions of the neighbouring datasets perspective, conduct the first systematic study of multiplicity in existing algorithms, and finally, propose novel multiplicity-aware methods,…
Peer Reviews
Decision·Submitted to ICLR 2026
The problem is well-motivated, and the paper is well-written and easy to follow. The notion of neighboring dataset connects many different aspects: Rashomon parameter, ambiguity, and active learning.
My main concern is on the overlapping coefficient. - The key parameter of the whole paper is this overlapping coefficient defined in 4.1. - However, the useful quantity is defined on the training data, i.e., $OVL_{train}^i$. As I understand, for most datasets, this quantity is simply zero. To see it, the quantity is defined upon the empirical training distribution. Thus this quantity is only nonzero when the training dataset contains >= 2 samples with same x but different y, say Sample 1 (x,
This paper addresses an interesting problem, and practical problem encountered by a lot of practitioners.
The presentation currently lacks rigor and precision. The theoretical framework would benefit from clearer definitions, more consistent notation, and stronger connections to related literature. Given that the authors claim their method to be efficient, it should be described in a much clearer and more structured manner, with each step of the procedure explicitly outlined and all underlying concepts rigorously defined.
I found the paper particularly clear, which is noteworthy given the number of notions introduced. I also appreciated the explicit discussion of assumptions and the careful formulation of the conjecture, reflecting a concern for transparency rather than an attempt to overstate the results. The related work is thoroughly and appropriately covered. The authors propose an original perspective on multiplicity by jointly considering the effects induced by both datasets and models. Interestingly, the
* **My main concerns relates to the multiplicity-aware data acquisition and imputation strategies. These are designed to either minimise or maximise multiplicity. However, I wonder whether it makes sense to minimize or maximize multiplicity.** In my understanding, multiplicity is a diagnostic rather than an objective. Minimising multiplicity artificially suppresses legitimate uncertainty. Creating an imputation technique that leads to small downstream multiplicity means I am drawing robust concl
Videos
