Prior shift estimation for positive unlabeled data through the lens of kernel embedding

Jan Mielniczuk; Wojciech Rejchel; Pawe{\l} Teisseyre

arXiv:2502.21194·stat.ML·May 22, 2026

Prior shift estimation for positive unlabeled data through the lens of kernel embedding

Jan Mielniczuk, Wojciech Rejchel, Pawe{\l} Teisseyre

PDF

TL;DR

This paper introduces a novel kernel embedding-based estimator for class prior in positive unlabeled data, effectively addressing prior shift and partial observability.

Contribution

It proposes a direct, geometrically interpretable estimator that avoids posterior probability estimation, with proven consistency and practical deviation bounds.

Findings

01

Estimator performs on par or better than competitors.

02

It is asymptotically consistent with explicit deviation bounds.

03

Works effectively on synthetic and real data.

Abstract

We study estimation of a class prior for unlabeled target samples which possibly differs from that of source population. Moreover, it is assumed that the source data is partially observable: only samples from the positive class and from the whole population are available (PU learning scenario). We introduce a novel direct estimator of a class prior which avoids estimation of posterior probabilities in both populations and has a simple geometric interpretation. It is based on a distribution matching technique together with kernel embedding in a Reproducing Kernel Hilbert Space and is obtained as an explicit solution to an optimisation task. We establish its asymptotic consistency as well as an explicit non-asymptotic bound on its deviation from the unknown prior, which is calculable in practice. We study finite sample behaviour for synthetic and real data and show that the proposal works…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Direction-of-Arrival Estimation Techniques · Control Systems and Identification