Learning from N-Tuple Data with M Positive Instances: Unbiased Risk Estimation and Theoretical Guarantees
Miao Zhang, Junpeng Li, ChangChun HUa, Yana Yang

TL;DR
This paper introduces an unbiased risk estimator for learning from n-tuple data with only positive counts, providing theoretical guarantees and demonstrating improved performance over existing weak supervision methods.
Contribution
The authors develop a novel unbiased risk estimator for NTMP supervision, extending it to variable tuple sizes and counts, with proven consistency and generalization bounds.
Findings
Outperforms baseline weak supervision methods on NTMP tasks
Remains robust under class imbalance and diverse tuple configurations
Provides theoretical guarantees including consistency and generalization bounds
Abstract
Weakly supervised learning often operates with coarse aggregate signals rather than instance labels. We study a setting where each training example is an -tuple containing exactly m positives, while only the count m per tuple is observed. This NTMP (N-tuple with M positives) supervision arises in, e.g., image classification with region proposals and multi-instance measurements. We show that tuple counts admit a trainable unbiased risk estimator (URE) by linking the tuple-generation process to latent instance marginals. Starting from fixed (n,m), we derive a closed-form URE and extend it to variable tuple sizes, variable counts, and their combination. Identification holds whenever the effective mixing rate is separated from the class prior. We establish generalization bounds via Rademacher complexity and prove statistical consistency with standard rates under mild regularity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques · Machine Learning and Data Classification
