Learning from Noisy Similar and Dissimilar Data
Soham Dan, Han Bao, Masashi Sugiyama

TL;DR
This paper introduces methods for training classifiers using noisy pairwise supervision data, such as similar and dissimilar pairs, which is useful in privacy-sensitive and crowd-sourced scenarios, and demonstrates improved performance over baseline methods.
Contribution
It proposes two algorithms for learning from noisy S-D data, analyzes their theoretical properties, and establishes connections to traditional label-based learning.
Findings
Algorithms outperform noise-blind baselines
Effective under two realistic noise models
Validated on synthetic and real datasets
Abstract
With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pairwise---in the form of Similar (S) pairs (if two examples belong to the same class) and Dissimilar (D) pairs (if two examples belong to different classes). This kind of supervision is realistic in privacy-sensitive domains. Although this problem has been looked at recently, it is unclear how to learn from such supervision under label noise, which is very common when the supervision is crowd-sourced. In this paper, we close this gap and demonstrate how to learn a classifier from noisy S and D labeled data. We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy S-D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms
