The Detection of Distributional Discrepancy for Text Generation

Xingyuan Chen; Ping Cai; Peng Jin; Haokun Du; Hongjun Wang; Xingyu; Dai; Jiajun Chen

arXiv:1910.04859·cs.CV·November 26, 2019

The Detection of Distributional Discrepancy for Text Generation

Xingyuan Chen, Ping Cai, Peng Jin, Haokun Du, Hongjun Wang, Xingyu, Dai, Jiajun Chen

PDF

Open Access

TL;DR

This paper introduces two new metrics to measure how much the distribution of generated text differs from real text, revealing that current language GANs fail to reduce this discrepancy despite multiple training rounds.

Contribution

The paper proposes two theoretical metrics for accurately measuring distributional discrepancy in text generation and evaluates their effectiveness in assessing language GANs.

Findings

01

The discrepancy between real and generated text remains large after training.

02

Existing language GANs do not effectively minimize distributional differences.

03

Discrepancy increases with more adversarial training rounds.

Abstract

The text generated by neural language models is not as good as the real text. This means that their distributions are different. Generative Adversarial Nets (GAN) are used to alleviate it. However, some researchers argue that GAN variants do not work at all. When both sample quality (such as Bleu) and sample diversity (such as self-Bleu) are taken into account, the GAN variants even are worse than a well-adjusted language model. But, Bleu and self-Bleu can not precisely measure this distributional discrepancy. In fact, how to measure the distributional discrepancy between real text and generated text is still an open problem. In this paper, we theoretically propose two metric functions to measure the distributional difference between real text and generated text. Besides that, a method is put forward to estimate them. First, we evaluate language model with these two functions and find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis

MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729