Learning from Noisy Similar and Dissimilar Data

Soham Dan; Han Bao; Masashi Sugiyama

arXiv:2002.00995·cs.LG·February 5, 2020·1 cites

Learning from Noisy Similar and Dissimilar Data

Soham Dan, Han Bao, Masashi Sugiyama

PDF

Open Access

TL;DR

This paper introduces methods for training classifiers using noisy pairwise supervision data, such as similar and dissimilar pairs, which is useful in privacy-sensitive and crowd-sourced scenarios, and demonstrates improved performance over baseline methods.

Contribution

It proposes two algorithms for learning from noisy S-D data, analyzes their theoretical properties, and establishes connections to traditional label-based learning.

Findings

01

Algorithms outperform noise-blind baselines

02

Effective under two realistic noise models

03

Validated on synthetic and real datasets

Abstract

With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pairwise---in the form of Similar (S) pairs (if two examples belong to the same class) and Dissimilar (D) pairs (if two examples belong to different classes). This kind of supervision is realistic in privacy-sensitive domains. Although this problem has been looked at recently, it is unclear how to learn from such supervision under label noise, which is very common when the supervision is crowd-sourced. In this paper, we close this gap and demonstrate how to learn a classifier from noisy S and D labeled data. We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy S-D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms