A Probabilistic Theory of Supervised Similarity Learning for Pointwise   ROC Curve Optimization

Robin Vogel; Aur\'elien Bellet; St\'ephan Cl\'emen\c{c}on

arXiv:1807.06981·stat.ML·January 25, 2019·5 cites

A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization

Robin Vogel, Aur\'elien Bellet, St\'ephan Cl\'emen\c{c}on

PDF

Open Access

TL;DR

This paper develops a probabilistic framework for similarity learning focused on optimizing the pointwise ROC curve, providing theoretical guarantees and addressing large-scale data challenges.

Contribution

It introduces a novel probabilistic approach to similarity learning for ROC optimization, with universal and faster learning rates and analysis of sampling effects.

Findings

01

Universal learning rates for the proposed method.

02

Faster rates under a noise assumption.

03

Effective sampling-based approximations for large-scale data.

Abstract

The performance of many machine learning techniques depends on the choice of an appropriate similarity or distance measure on the input space. Similarity learning (or metric learning) aims at building such a measure from training data so that observations with the same (resp. different) label are as close (resp. far) as possible. In this paper, similarity learning is investigated from the perspective of pairwise bipartite ranking, where the goal is to rank the elements of a database by decreasing order of the probability that they share the same label with some query data point, based on the similarity scores. A natural performance criterion in this setting is pointwise ROC optimization: maximize the true positive rate under a fixed false positive rate. We study this novel perspective on similarity learning through a rigorous probabilistic framework. The empirical version of the problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Data-Driven Disease Surveillance · Anomaly Detection Techniques and Applications