Summarising Historical Text in Modern Languages

Xutan Peng; Yi Zheng; Chenghua Lin; Advaith Siddharthan

arXiv:2101.10759·cs.CL·January 25, 2022

Summarising Historical Text in Modern Languages

Xutan Peng, Yi Zheng, Chenghua Lin, Advaith Siddharthan

PDF

1 Repo

TL;DR

This paper introduces the novel task of summarising historical texts in modern languages, creating a new dataset and proposing a transfer learning model that outperforms existing methods, aiding historians and digital humanities research.

Contribution

The paper presents the first dataset for historical text summarisation and a transfer learning approach that works without parallel historical-modern data.

Findings

01

The proposed model outperforms standard cross-lingual benchmarks.

02

The dataset highlights the unique challenges of historical to modern language summarisation.

03

Automatic and human evaluations confirm the effectiveness of the approach.

Abstract

We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Pzoom522/HistSumm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.