InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Jan Trienes, Sebastian Joseph, J\"org Schl\"otterer, Christin Seifert,, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li

TL;DR
This paper introduces InfoLossQA, a framework for identifying and recovering information lost during text simplification, using QA pairs to help readers understand what was omitted, supported by experiments on scientific abstracts.
Contribution
The paper presents a novel QA-based framework and methods to characterize and recover information loss in text simplification, with a new dataset and evaluation approach.
Findings
Information loss is frequent in text simplification.
QA pairs effectively summarize lost information.
Models currently struggle to reliably identify information loss.
Abstract
Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
