Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data
Lan Tao, Shirong Xu, Chi-Hua Wang, Namjoon Suh, Guang Cheng

TL;DR
This paper introduces a discriminative method to estimate the total variation distance between distributions, providing a fidelity measure for generative data, with theoretical convergence guarantees and empirical validation on Gaussian and image data.
Contribution
It proposes a novel discriminative approach linking Bayes risk to TV distance estimation, with theoretical analysis and practical application to synthetic image data.
Findings
Fast convergence rate for TV distance estimation between Gaussians.
Estimation accuracy improves with greater separation of Gaussian distributions.
Empirical validation confirms theoretical results and effectiveness on MNIST data.
Abstract
With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance. Therefore, the estimation of total variation distance reduces to that of the Bayes risk. In particular, this paper establishes theoretical results regarding the convergence rate of the estimation error of TV distance between two Gaussian distributions. We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Mining Algorithms and Applications
