Learning A Disentangling Representation For PU Learning
Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy

TL;DR
This paper introduces a neural network-based representation learning method for PU learning that effectively separates positive and negative data clusters, improving classification performance in high-dimensional settings.
Contribution
It proposes a novel clustering-based representation learning approach with vector quantization for PU learning, supported by theoretical justification and experimental validation.
Findings
Outperforms current state-of-the-art methods on simulated data
Effectively separates positive and negative clusters in high-dimensional data
Provides theoretical insights into the clustering approach
Abstract
In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
The paper is well written and easy to understand. The results sound good and perhaps easy to re-produce if the authors can publish their code.
1. The idea is simple and the novelty may not be strong enough to publish in such a high standard conference. 2. The k-means algorithms need to keep running in each iteration. Although the idea is simple, it will be very slow if the data size is huge. 3. The proposed method is not convinced to handle the case when the labels are imbalanced.
- This paper is well-written and quite easy to follow. - The experimental results and ablation study show the effectiveness of the proposed method.
- The innovation of this paper seems to be limited. In this paper, the authors directly employ the exited vector quantization technique [1] to learn a disentangling representation for PU learning with little modification. Though the idea of learning a disentangling representation for PU learning may be interesting, the contribution of this paper is very limited. Otherwise, there lacks of reference to the original paper “Neural discrete representation learning” [1] of the vector quantization tech
- The paper is well written. - The idea of introducing codebook representations into PU learning is novel.
- It is still unclear to me why the proposed method works for PU learning. Although the authors provided some theoretical explanations, I am still not clear why the proposed method can separate feature representations of P and N data. - The proposed method is influenced by the center representations of P and U data ($\mu_P$ and $\mu_U$). If the two representations are too close, there seems to be no guarantee that the method will work well. - In Eq.(6), the authors claim that they do not need $
1. The studied problem in this paper is very important. 2. The experiments are sufficient.
1. The authors claim that the existing PU learning methods will suffer a gradual decline in performance as the dimensionality of the data increases. It would be better if the authors can visualize this effect. This is very important as this is the research motivation of this paper. 2. Since the authors claim that the high dimensionality is harmful for the PU methods, have the authors tried to firstly implement dimension reduction via some existing approaches and then deploy traditional PU classi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
