AbLit: A Resource for Analyzing and Generating Abridged Versions of   English Literature

Melissa Roemmele; Kyle Shaffer; Katrina Olsen; Yiyi Wang; Steve; DeNeefe

arXiv:2302.06579·cs.CL·February 14, 2023

AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature

Melissa Roemmele, Kyle Shaffer, Katrina Olsen, Yiyi Wang, Steve, DeNeefe

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces AbLit, a novel dataset and models for the challenging task of creating abridged versions of English literature, focusing on passage-level alignment and linguistic relation prediction.

Contribution

It presents the first NLP-focused resource for abridgement, including a dataset with alignments and models for relation prediction and text generation.

Findings

01

Abridgement is a complex NLP task.

02

The dataset enables new research in text simplification.

03

Automated models show promising results in predicting relations and generating abridged texts.

Abstract

Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roemmele/ablit
noneOfficial

Models

🤗
LanguageWeaver/ablit-bart-base
model· 8 dl
8 dl

Datasets

roemmele/ablit
dataset· 63 dl
63 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Humanities and Scholarship · Text Readability and Simplification