Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
Jialei Liu, Jun Liao, Kuangnan Fang

TL;DR
This paper introduces a transfer learning framework using model averaging for positive-unlabeled data, effectively integrating heterogeneous sources without data sharing, and providing theoretical guarantees and superior empirical performance.
Contribution
It proposes a novel model averaging approach for PU learning that handles heterogeneous data sources and offers theoretical and empirical validation.
Findings
Outperforms existing methods in predictive accuracy.
Provides theoretical guarantees for weight optimality.
Effective in high-dimensional and limited data scenarios.
Abstract
Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with model averaging framework that integrates information from heterogeneous data sources - including fully binary labeled, semi-supervised, and PU data sets - without direct data sharing. For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging. Optimal weights for combining source models are determined via a cross-validation criterion that minimizes the Kullback-Leibler divergence. We establish theoretical guarantees for weight optimality and convergence, covering both misspecified and correctly specified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
