Can LMs Generalize to Future Data? An Empirical Analysis on Text   Summarization

Chi Seng Cheang; Hou Pong Chan; Derek F. Wong; Xuebo Liu; Zhaocong Li,; Yanming Sun; Shudong Liu; Lidia S. Chao

arXiv:2305.01951·cs.CL·November 3, 2023·2 cites

Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization

Chi Seng Cheang, Hou Pong Chan, Derek F. Wong, Xuebo Liu, Zhaocong Li,, Yanming Sun, Shudong Liu, Lidia S. Chao

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper investigates how well pre-trained language models generalize to future data in text summarization, revealing that memorized knowledge impacts faithfulness and current methods struggle to improve future performance.

Contribution

The study introduces TempoSum, a new benchmark for evaluating temporal generalization in summarization models, and provides insights into the limitations of existing faithfulness enhancement techniques.

Findings

01

Parametric knowledge affects summary faithfulness on future data

02

Existing faithfulness methods do not reliably improve future performance

03

Models struggle to generalize to data from 2010 to 2022

Abstract

Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. However, existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. Hence, the strong performance of PLMs may rely on the parametric knowledge that is memorized during pre-training and fine-tuning. Moreover, the knowledge memorized by PLMs may quickly become outdated, which affects the generalization performance of PLMs on future data. In this work, we propose TempoSum, a novel benchmark that contains data samples from 2010 to 2022, to understand the temporal generalization ability of abstractive summarization models. Through extensive human evaluation, we show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data. Moreover, existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlp2ct/temposum
noneOfficial

Datasets

chiseng-cheang/TempoSum
dataset· 12 dl
12 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies