R-divergence for Estimating Model-oriented Distribution Discrepancy

Zhilin Zhao; Longbing Cao

arXiv:2310.01109·cs.LG·October 3, 2023

R-divergence for Estimating Model-oriented Distribution Discrepancy

Zhilin Zhao, Longbing Cao

PDF

Open Access 1 Video

TL;DR

R-divergence is a novel method for assessing distribution differences tailored to specific models, improving robustness and accuracy in tasks like noisy label training.

Contribution

The paper introduces R-divergence, a new model-oriented divergence measure that estimates distribution discrepancy by comparing optimal hypothesis risks, achieving state-of-the-art results.

Findings

01

R-divergence outperforms existing methods in distribution discrepancy testing.

02

It effectively trains robust neural networks on noisy data.

03

Demonstrates practical utility in real-world tasks.

Abstract

Real-life data are often non-IID due to complex distributions and interactions, and the sensitivity to the distribution of samples can differ among learning models. Accordingly, a key question for any supervised or unsupervised model is whether the probability distributions of two given datasets can be considered identical. To address this question, we introduce R-divergence, designed to assess model-oriented distribution discrepancies. The core insight is that two distributions are likely identical if their optimal hypothesis yields the same expected risk for each distribution. To estimate the distribution discrepancy between two datasets, R-divergence learns a minimum hypothesis on the mixed data and then gauges the empirical risk difference between them. We evaluate the test power across various unsupervised and supervised tasks and find that R-divergence achieves state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

R-divergence for Estimating Model-oriented Distribution Discrepancy· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis