Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics
Carles Domingo-Enrich, Youssef Mroueh

TL;DR
This paper demonstrates that feature-learning discriminators can distinguish between distributions in high dimensions where fixed-kernel discriminators fail, highlighting the superiority of learned features in generative modeling.
Contribution
The paper provides theoretical separation results between fixed-kernel and feature-learning probability metrics, using specific function classes and constructing distribution pairs that highlight their differences.
Findings
Feature-learning discriminators outperform fixed-kernel ones in high-dimensional distribution discrimination.
Fixed-kernel metrics are weaker and cannot distinguish certain distribution pairs discriminated by feature learning.
Links established between IPMs, Stein discrepancy, and sliced Wasserstein distances.
Abstract
Several works in implicit and explicit generative modeling empirically observed that feature-learning discriminators outperform fixed-kernel discriminators in terms of the sample quality of the models. We provide separation results between probability metrics with fixed-kernel and feature-learning discriminators using the function classes and respectively, which were developed to study overparametrized two-layer neural networks. In particular, we construct pairs of distributions over hyper-spheres that can not be discriminated by fixed kernel integral probability metric (IPM) and Stein discrepancy (SD) in high dimensions, but that can be discriminated by their feature learning () counterparts. To further study the separation we provide links between the and IPMs with sliced Wasserstein…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
