An In-depth Evaluation of Large Language Models in Sentence Simplification with Error-based Human Assessment

Xuanxin Wu; Yuki Arase

arXiv:2403.04963·cs.CL·July 15, 2025·1 cites

An In-depth Evaluation of Large Language Models in Sentence Simplification with Error-based Human Assessment

Xuanxin Wu, Yuki Arase

PDF

Open Access

TL;DR

This paper evaluates large language models' sentence simplification abilities using an error-based human assessment framework, revealing their strengths, limitations, and the inadequacy of current automatic metrics for high-quality simplifications.

Contribution

It introduces an error-based human annotation framework for more reliable evaluation of LLMs in sentence simplification, addressing limitations of existing methods.

Findings

01

LLMs generate fewer errors than previous models.

02

GPT-4 and Qwen2.5-72B struggle with lexical paraphrasing.

03

Automatic metrics lack sensitivity for high-quality LLM outputs.

Abstract

Recent studies have used both automatic metrics and human evaluations to assess the simplification abilities of LLMs. However, the suitability of existing evaluation methodologies for LLMs remains in question. First, the suitability of current automatic metrics on LLMs' simplification evaluation is still uncertain. Second, current human evaluation approaches in sentence simplification often fall into two extremes: they are either too superficial, failing to offer a clear understanding of the models' performance, or overly detailed, making the annotation process complex and prone to inconsistency, which in turn affects the evaluation's reliability. To address these problems, this study provides in-depth insights into LLMs' performance while ensuring the reliability of the evaluation. We design an error-based human annotation framework to assess the LLMs' simplification capabilities. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Layer Normalization · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing