Improved Estimation of Class Prior Probabilities through Unlabeled Data
Norman Matloff

TL;DR
This paper introduces new methods leveraging unlabeled data to improve the estimation of class prior probabilities, reducing variance and applicable to subclasses, which is valuable when labels are costly or difficult to obtain.
Contribution
It develops both parametric and nonparametric estimators for class prior probabilities using unlabeled data, extending to subclass probabilities, with proven asymptotic variance reduction.
Findings
Unlabeled data reduces asymptotic variance in class prior estimation.
New estimators outperform traditional methods when labels are scarce.
Method applicable to subclass probability estimation.
Abstract
Work in the classification literature has shown that in computing a classification function, one need not know the class membership of all observations in the training set; the unlabeled observations still provide information on the marginal distribution of the feature set, and can thus contribute to increased classification accuracy for future observations. The present paper will show that this scheme can also be used for the estimation of class prior probabilities, which would be very useful in applications in which it is difficult or expensive to determine class membership. Both parametric and nonparametric estimators are developed. Asymptotic distributions of the estimators are derived, and it is proven that the use of the unlabeled observations does reduce asymptotic variance. This methodology is also extended to the estimation of subclass probabilities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Advanced Statistical Process Monitoring
