Benchmarking Large Language Model Capabilities for Conditional Generation
Joshua Maynez, Priyanka Agrawal, Sebastian Gehrmann

TL;DR
This paper evaluates the capabilities and limitations of large pre-trained language models in natural language generation tasks, providing insights into their performance across different data regimes, languages, and architectures.
Contribution
It adapts existing benchmarks for application-specific generation to large language models and offers an empirical analysis of their strengths and weaknesses.
Findings
PLMs vary in applicability across data regimes
Performance differs across languages and architectures
Guidelines for benchmarking future PLMs are provided
Abstract
Pre-trained large language models (PLMs) underlie most new developments in natural language processing. They have shifted the field from application-specific model pipelines to a single model that is adapted to a wide range of tasks. Autoregressive PLMs like GPT-3 or PaLM, alongside techniques like few-shot learning, have additionally shifted the output modality to generation instead of classification or regression. Despite their ubiquitous use, the generation quality of language models is rarely evaluated when these models are introduced. Additionally, it is unclear how existing generation tasks--while they can be used to compare systems at a high level--relate to the real world use cases for which people have been adopting them. In this work, we discuss how to adapt existing application-specific generation benchmarks to PLMs and provide an in-depth, empirical study of the limitations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Adam · Byte Pair Encoding · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Softmax
