Benchmarking Large Language Model Capabilities for Conditional   Generation

Joshua Maynez; Priyanka Agrawal; Sebastian Gehrmann

arXiv:2306.16793·cs.CL·June 30, 2023·1 cites

Benchmarking Large Language Model Capabilities for Conditional Generation

Joshua Maynez, Priyanka Agrawal, Sebastian Gehrmann

PDF

Open Access

TL;DR

This paper evaluates the capabilities and limitations of large pre-trained language models in natural language generation tasks, providing insights into their performance across different data regimes, languages, and architectures.

Contribution

It adapts existing benchmarks for application-specific generation to large language models and offers an empirical analysis of their strengths and weaknesses.

Findings

01

PLMs vary in applicability across data regimes

02

Performance differs across languages and architectures

03

Guidelines for benchmarking future PLMs are provided

Abstract

Pre-trained large language models (PLMs) underlie most new developments in natural language processing. They have shifted the field from application-specific model pipelines to a single model that is adapted to a wide range of tasks. Autoregressive PLMs like GPT-3 or PaLM, alongside techniques like few-shot learning, have additionally shifted the output modality to generation instead of classification or regression. Despite their ubiquitous use, the generation quality of language models is rarely evaluated when these models are introduced. Additionally, it is unclear how existing generation tasks--while they can be used to compare systems at a high level--relate to the real world use cases for which people have been adopting them. In this work, we discuss how to adapt existing application-specific generation benchmarks to PLMs and provide an in-depth, empirical study of the limitations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Adam · Byte Pair Encoding · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Softmax