Not too long do read: Evaluating LLM-generated extreme scientific summaries

Zhuoqi Lyu; Qing Ke

arXiv:2512.23206·cs.CL·December 30, 2025

Not too long do read: Evaluating LLM-generated extreme scientific summaries

Zhuoqi Lyu, Qing Ke

PDF

Open Access 1 Datasets

TL;DR

This paper introduces BiomedTLDR, a new dataset of researcher-authored scientific summaries, and evaluates how well large language models generate scientific TLDRs, revealing their tendencies towards extractiveness over abstraction.

Contribution

The paper presents BiomedTLDR, a novel high-quality dataset for scientific summarization, and provides an empirical evaluation of LLMs' performance in generating scientific TLDRs.

Findings

01

LLMs tend to produce more extractive summaries than human experts.

02

Some LLMs can generate humanoid-like summaries, but often rely on lexical and rhetorical similarities.

03

The dataset enables better evaluation and development of scientific summarization models.

Abstract

High-quality scientific extreme summary (TLDR) facilitates effective science communication. How do large language models (LLMs) perform in generating them? How are LLM-generated summaries different from those written by human experts? However, the lack of a comprehensive, high-quality scientific TLDR dataset hinders both the development and evaluation of LLMs' summarization ability. To address these, we propose a novel dataset, BiomedTLDR, containing a large sample of researcher-authored summaries from scientific papers, which leverages the common practice of including authors' comments alongside bibliography items. We then test popular open-weight LLMs for generating TLDRs based on abstracts. Our analysis reveals that, although some of them successfully produce humanoid summaries, LLMs generally exhibit a greater affinity for the original text's lexical choices and rhetorical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Keylab/BiomedTLDR
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Text Readability and Simplification