Assessing Human Editing Effort on LLM-Generated Texts via   Compression-Based Edit Distance

Nicolas Devatine; Louis Abraham

arXiv:2412.17321·cs.CL·December 24, 2024

Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance

Nicolas Devatine, Louis Abraham

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a compression-based edit distance metric using Lempel-Ziv-77 to better quantify human editing effort on texts generated by LLMs, outperforming traditional metrics in accuracy and efficiency.

Contribution

The paper presents a novel compression-based edit distance metric that accurately measures editing effort and correlates with actual post-editing time, addressing limitations of existing metrics.

Findings

01

The proposed metric correlates strongly with actual editing effort.

02

It captures complex edits more effectively than traditional metrics.

03

The method has linear computational complexity.

Abstract

Assessing the extent of human edits on texts generated by Large Language Models (LLMs) is crucial to understanding the human-AI interactions and improving the quality of automated text generation systems. Existing edit distance metrics, such as Levenshtein, BLEU, ROUGE, and TER, often fail to accurately measure the effort required for post-editing, especially when edits involve substantial modifications, such as block operations. In this paper, we introduce a novel compression-based edit distance metric grounded in the Lempel-Ziv-77 algorithm, designed to quantify the amount of post-editing applied to LLM-generated texts. Our method leverages the properties of text compression to measure the informational difference between the original and edited texts. Through experiments on real-world human edits datasets, we demonstrate that our proposed metric is highly correlated with actual edit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Tiime/fr-qa-accounting-edits
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing