Comparing Apples to Oranges: Learning Similarity Functions for Data   Produced by Different Distributions

Leonidas Tsepenekas; Ivan Brugere; Freddy Lecue; Daniele Magazzeni

arXiv:2208.12731·cs.LG·October 24, 2023

Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions

Leonidas Tsepenekas, Ivan Brugere, Freddy Lecue, Daniele Magazzeni

PDF

Open Access 1 Video

TL;DR

This paper introduces an efficient sampling framework to learn similarity functions across different data distributions or demographic groups using limited expert feedback, supported by theoretical bounds and empirical validation.

Contribution

It proposes a novel sampling method for learning cross-group similarity functions with limited feedback, backed by theoretical analysis and extensive experiments.

Findings

01

The framework achieves accurate similarity estimation with limited expert input.

02

Theoretical bounds guarantee the learning performance.

03

Empirical results validate the effectiveness of the proposed approach.

Abstract

Similarity functions measure how comparable pairs of elements are, and play a key role in a wide variety of applications, e.g., notions of Individual Fairness abiding by the seminal paradigm of Dwork et al., as well as Clustering problems. However, access to an accurate similarity function should not always be considered guaranteed, and this point was even raised by Dwork et al. For instance, it is reasonable to assume that when the elements to be compared are produced by different distributions, or in other words belong to different ``demographic'' groups, knowledge of their true similarity might be very difficult to obtain. In this work, we present an efficient sampling framework that learns these across-groups similarity functions, using only a limited amount of experts' feedback. We show analytical results with rigorous theoretical bounds, and empirically validate our algorithms via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions· slideslive

Taxonomy

TopicsBayesian Modeling and Causal Inference