Learning A Disentangling Representation For PU Learning

Omar Zamzam; Haleh Akrami; Mahdi Soltanolkotabi; Richard Leahy

arXiv:2310.03833·cs.LG·October 9, 2023·1 cites

Learning A Disentangling Representation For PU Learning

Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy

PDF

Open Access 4 Reviews

TL;DR

This paper introduces a neural network-based representation learning method for PU learning that effectively separates positive and negative data clusters, improving classification performance in high-dimensional settings.

Contribution

It proposes a novel clustering-based representation learning approach with vector quantization for PU learning, supported by theoretical justification and experimental validation.

Findings

01

Outperforms current state-of-the-art methods on simulated data

02

Effectively separates positive and negative clusters in high-dimensional data

03

Provides theoretical insights into the clustering approach

Abstract

In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The paper is well written and easy to understand. The results sound good and perhaps easy to re-produce if the authors can publish their code.

Weaknesses

1. The idea is simple and the novelty may not be strong enough to publish in such a high standard conference. 2. The k-means algorithms need to keep running in each iteration. Although the idea is simple, it will be very slow if the data size is huge. 3. The proposed method is not convinced to handle the case when the labels are imbalanced.

Reviewer 02Rating 3· reject, not good enoughConfidence 5

Strengths

- This paper is well-written and quite easy to follow. - The experimental results and ablation study show the effectiveness of the proposed method.

Weaknesses

- The innovation of this paper seems to be limited. In this paper, the authors directly employ the exited vector quantization technique [1] to learn a disentangling representation for PU learning with little modification. Though the idea of learning a disentangling representation for PU learning may be interesting, the contribution of this paper is very limited. Otherwise, there lacks of reference to the original paper “Neural discrete representation learning” [1] of the vector quantization tech

Reviewer 03Rating 3· reject, not good enoughConfidence 3

Strengths

- The paper is well written. - The idea of introducing codebook representations into PU learning is novel.

Weaknesses

- It is still unclear to me why the proposed method works for PU learning. Although the authors provided some theoretical explanations, I am still not clear why the proposed method can separate feature representations of P and N data. - The proposed method is influenced by the center representations of P and U data ($\mu_P$ and $\mu_U$). If the two representations are too close, there seems to be no guarantee that the method will work well. - In Eq.(6), the authors claim that they do not need $

Reviewer 04Rating 3· reject, not good enoughConfidence 4

Strengths

1. The studied problem in this paper is very important. 2. The experiments are sufficient.

Weaknesses

1. The authors claim that the existing PU learning methods will suffer a gradual decline in performance as the dimensionality of the data increases. It would be better if the authors can visualize this effect. This is very important as this is the research motivation of this paper. 2. Since the authors claim that the high dimensionality is harmful for the PU methods, have the authors tried to firstly implement dimension reduction via some existing approaches and then deploy traditional PU classi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning