Model Criticism for Long-Form Text Generation
Yuntian Deng, Volodymyr Kuleshov, Alexander M. Rush

TL;DR
This paper introduces a model criticism approach in latent space to evaluate high-level structure in long-form text generated by language models, revealing strengths in topicality but weaknesses in coherence and coreference.
Contribution
It applies a novel latent space model criticism method to assess high-level discourse structure in generated text, highlighting specific failure modes.
Findings
Transformers capture topical structures well.
Models struggle with coherence and coreference.
Latent space analysis reveals specific failure modes.
Abstract
Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e.g., story progression). Here, we propose to apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of the generated text. Model criticism compares the distributions between real and generated data in a latent space obtained according to an assumptive generative process. Different generative processes identify specific failure modes of the underlying model. We perform experiments on three representative aspects of high-level discourse -- coherence, coreference, and topicality -- and find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
