Simplifying Scholarly Abstracts for Accessible Digital Libraries
Haining Wang, Jason Clark

TL;DR
This paper presents a method for fine-tuning language models to simplify scholarly abstracts, making scientific literature more accessible to a broader audience, including those with lower reading levels.
Contribution
The authors created a specialized corpus and fine-tuned multiple models to improve abstract readability while preserving content fidelity, offering a more accessible alternative to commercial models.
Findings
Models improved readability by over three grade levels.
Models maintained semantic fidelity to original abstracts.
Proposed models are more compact and privacy-preserving.
Abstract
Standing at the forefront of knowledge dissemination, digital libraries curate vast collections of scientific literature. However, these scholarly writings are often laden with jargon and tailored for domain experts rather than the general public. As librarians, we strive to offer services to a diverse audience, including those with lower reading levels. To extend our services beyond mere access, we propose fine-tuning a language model to rewrite scholarly abstracts into more comprehensible versions, thereby making scholarly literature more accessible when requested. We began by introducing a corpus specifically designed for training models to simplify scholarly abstracts. This corpus consists of over three thousand pairs of abstracts and significance statements from diverse disciplines. We then fine-tuned four language models using this corpus. The outputs from the models were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Semantic Web and Ontologies · Digital Rights Management and Security
