Labels, Information, and Computation: Efficient Learning Using Sufficient Labels
Shiyu Duan, Spencer Chang, and Jose C. Principe

TL;DR
This paper introduces the concept of sufficiently-labeled data, a minimal yet informative label summary that enables efficient learning and reduces labeling costs by capturing essential information for classification.
Contribution
The paper proposes a new label summary method inspired by statistical sufficiency, demonstrating its effectiveness and efficiency for training classifiers with minimal fully-labeled data.
Findings
Sufficiently-labeled data captures nearly all relevant information for classification.
Training with as few as one fully-labeled example per class is effective.
Sufficiently-labeled data is easier and more secure to obtain than fully-labeled data.
Abstract
In supervised learning, obtaining a large set of fully-labeled training data is expensive. We show that we do not always need full label information on every single training example to train a competent classifier. Specifically, inspired by the principle of sufficiency in statistics, we present a statistic (a summary) of the fully-labeled training set that captures almost all the relevant information for classification but at the same time is easier to obtain directly. We call this statistic "sufficiently-labeled data" and prove its sufficiency and efficiency for finding the optimal hidden representations, on which competent classifier heads can be trained using as few as a single randomly-chosen fully-labeled example per class. Sufficiently-labeled data can be obtained from annotators directly without collecting the fully-labeled data first. And we prove that it is easier to directly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Advanced Bandit Algorithms Research
