Why GANs are overkill for NLP
David Alvarez-Melis, Vikas Garg, Adam Tauman Kalai

TL;DR
This paper provides a theoretical explanation for why GANs are less effective for NLP tasks, showing that likelihood-based methods are fundamentally more efficient for sequential data like text.
Contribution
It introduces a novel theoretical framework demonstrating that likelihood maximization inherently minimizes distinguishability, explaining the limited success of GANs in NLP.
Findings
Likelihood maximization and distinguishability minimization are closely related.
Minimizing KL-divergence effectively reduces model distinguishability.
A new next-token distinguishability model enables polynomial-time reduction.
Abstract
This work offers a novel theoretical perspective on why, despite numerous attempts, adversarial approaches to generative modeling (e.g., GANs) have not been as popular for certain generation tasks, particularly sequential tasks such as Natural Language Generation, as they have in others, such as Computer Vision. In particular, on sequential data such as text, maximum-likelihood approaches are significantly more utilized than GANs. We show that, while it may seem that maximizing likelihood is inherently different than minimizing distinguishability, this distinction is largely artificial and only holds for limited models. We argue that minimizing KL-divergence (i.e., maximizing likelihood) is a more efficient approach to effectively minimizing the same distinguishability criteria that adversarial models seek to optimize. Reductions show that minimizing distinguishability can be seen as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning
MethodsSoftmax
