Proper Learnability and the Role of Unlabeled Data
Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng

TL;DR
This paper investigates the conditions under which proper learning is feasible, demonstrating that unlabeled data offers limited benefits in worst-case scenarios and revealing fundamental limitations and undecidability issues in proper PAC learning.
Contribution
It introduces the distribution-fixed PAC model with distributional regularization and provides new impossibility results, including undecidability and non-monotonicity of proper learnability.
Findings
Unlabeled data only marginally reduces sample complexity in worst-case PAC learning.
Proper learnability can be undecidable and is not a monotone or local property.
Impossibility results hold even for multiclass classification, linking to EMX learning.
Abstract
Proper learning refers to the setting in which learners must emit predictors in the underlying hypothesis class , and often leads to learners with simple algorithmic forms (e.g. empirical risk minimization (ERM), structural risk minimization (SRM)). The limitation of proper learning, however, is that there exist problems which can only be learned improperly, e.g. in multiclass classification. Thus, we ask: Under what assumptions on the hypothesis class or the information provided to the learner is a problem properly learnable? We first demonstrate that when the unlabeled data distribution is given, there always exists an optimal proper learner governed by distributional regularization, a randomized generalization of regularization. We refer to this setting as the distribution-fixed PAC model, and continue to evaluate the learner on its worst-case performance over all distributions.…
Peer Reviews
Decision·ALT 2025
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Assessment and Pedagogy
