Simplifying Scholarly Abstracts for Accessible Digital Libraries

Haining Wang; Jason Clark

arXiv:2408.03899·cs.CL·August 8, 2024

Simplifying Scholarly Abstracts for Accessible Digital Libraries

Haining Wang, Jason Clark

PDF

Open Access 1 Repo

TL;DR

This paper presents a method for fine-tuning language models to simplify scholarly abstracts, making scientific literature more accessible to a broader audience, including those with lower reading levels.

Contribution

The authors created a specialized corpus and fine-tuned multiple models to improve abstract readability while preserving content fidelity, offering a more accessible alternative to commercial models.

Findings

01

Models improved readability by over three grade levels.

02

Models maintained semantic fidelity to original abstracts.

03

Proposed models are more compact and privacy-preserving.

Abstract

Standing at the forefront of knowledge dissemination, digital libraries curate vast collections of scientific literature. However, these scholarly writings are often laden with jargon and tailored for domain experts rather than the general public. As librarians, we strive to offer services to a diverse audience, including those with lower reading levels. To extend our services beyond mere access, we propose fine-tuning a language model to rewrite scholarly abstracts into more comprehensible versions, thereby making scholarly literature more accessible when requested. We began by introducing a corpus specifically designed for training models to simplify scholarly abstracts. This corpus consists of over three thousand pairs of abstracts and significance statements from diverse disciplines. We then fine-tuned four language models using this corpus. The outputs from the models were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wang-Haining/scholarly_abstract_simplification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLibrary Science and Information Systems · Semantic Web and Ontologies · Digital Rights Management and Security