Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance
Nicolas Devatine, Louis Abraham

TL;DR
This paper introduces a compression-based edit distance metric using Lempel-Ziv-77 to better quantify human editing effort on texts generated by LLMs, outperforming traditional metrics in accuracy and efficiency.
Contribution
The paper presents a novel compression-based edit distance metric that accurately measures editing effort and correlates with actual post-editing time, addressing limitations of existing metrics.
Findings
The proposed metric correlates strongly with actual editing effort.
It captures complex edits more effectively than traditional metrics.
The method has linear computational complexity.
Abstract
Assessing the extent of human edits on texts generated by Large Language Models (LLMs) is crucial to understanding the human-AI interactions and improving the quality of automated text generation systems. Existing edit distance metrics, such as Levenshtein, BLEU, ROUGE, and TER, often fail to accurately measure the effort required for post-editing, especially when edits involve substantial modifications, such as block operations. In this paper, we introduce a novel compression-based edit distance metric grounded in the Lempel-Ziv-77 algorithm, designed to quantify the amount of post-editing applied to LLM-generated texts. Our method leverages the properties of text compression to measure the informational difference between the original and edited texts. Through experiments on real-world human edits datasets, we demonstrate that our proposed metric is highly correlated with actual edit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing
