Model positive and unlabeled data with a generalized additive density ratio model
Peijun Sang, Yifan Sun, Qinglong Tian, Donglin Zeng, Pengfei Li

TL;DR
This paper introduces a flexible generalized additive density ratio model for positive and unlabeled data, improving estimation accuracy when relationships are nonlinear, and providing a practical algorithm with theoretical guarantees.
Contribution
It proposes a novel generalized additive framework for PU learning that maintains identifiability and enhances flexibility over traditional linear models.
Findings
Matches standard methods when linear assumptions hold
Outperforms linear models in nonlinear scenarios
Provides reliable estimation and inference tools
Abstract
We address learning from positive and unlabeled (PU) data, a common setting in which only some positives are labeled and the rest are mixed with negatives. Classical exponential tilting models guarantee identifiability by assuming a linear structure, but they can be badly misspecified when relationships are nonlinear. We propose a generalized additive density-ratio framework that retains identifiability while allowing smooth, feature-specific effects. The approach comes with a practical fitting algorithm and supporting theory that enables estimation and inference for the mixture proportion and other quantities of interest. In simulations and analyses of benchmark datasets, the proposed method matches the standard exponential tilting method when the linear model is correct and delivers clear gains when it is not. Overall, the framework strikes a useful balance between flexibility and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
