Contrastive Search Is What You Need For Neural Text Generation
Yixuan Su, Nigel Collier

TL;DR
This paper investigates the isotropy of autoregressive language models across multiple languages and demonstrates that contrastive search decoding significantly improves text generation quality without extra training, often reaching human-level performance.
Contribution
The study challenges previous assumptions about model anisotropy, showing most models are naturally isotropic, and validates contrastive search as an effective decoding method across diverse languages.
Findings
Most language models are naturally isotropic, contrary to prior beliefs.
Contrastive search outperforms previous decoding methods on multiple languages.
Achieves human-level performance in 12 out of 16 languages.
Abstract
Generating text with autoregressive language models (LMs) is of great importance to many natural language processing (NLP) applications. Previous solutions for this task often produce text that contains degenerative expressions or lacks semantic consistency. Recently, Su et al. introduced a new decoding method, contrastive search, based on the isotropic representation space of the language model and obtained new state of the art on various benchmarks. Additionally, Su et al. argued that the representations of autoregressive LMs (e.g. GPT-2) are intrinsically anisotropic which is also shared by previous studies. Therefore, to ensure the language model follows an isotropic distribution, Su et al. proposed a contrastive learning scheme, SimCTG, which calibrates the language model's representations through additional training. In this study, we first answer the question: "Are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗FredZhang7/distilgpt2-stable-diffusion-v2model· 1.4k dl· ♡ 1001.4k dl♡ 100
- 🤗FredZhang7/anime-anything-promptgen-v2model· 1.0k dl· ♡ 621.0k dl♡ 62
- 🤗Cohee/anime-anything-promptgen-onnxmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗RichardErkhov/FredZhang7_-_anime-anything-promptgen-v2-4bitsmodel· 3 dl3 dl
- 🤗RichardErkhov/FredZhang7_-_distilgpt2-stable-diffusion-v2-4bitsmodel· 2 dl2 dl
- 🤗RichardErkhov/FredZhang7_-_distilgpt2-stable-diffusion-v2-8bitsmodel· 5 dl5 dl
- 🤗RichardErkhov/FredZhang7_-_anime-anything-promptgen-v2-8bitsmodel· 2 dl2 dl
- 🤗tensorblock/anime-anything-promptgen-v2-GGUFmodel· 45 dl· ♡ 245 dl♡ 2
- 🤗tensorblock/distilgpt2-stable-diffusion-v2-GGUFmodel· 165 dl· ♡ 1165 dl♡ 1
- 🤗mradermacher/distilgpt2-stable-diffusion-v2-GGUFmodel· 188 dl188 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsContrastive Learning
