SUMIE: A Synthetic Benchmark for Incremental Entity Summarization
Eunjeong Hwang, Yichao Zhou, Beliz Gunel, James Bradley Wendt and, Sandeep Tata

TL;DR
SUMIE is a synthetic benchmark dataset designed to evaluate how well language models can incrementally update entity summaries, revealing current limitations and guiding future improvements.
Contribution
We introduce SUMIE, a novel synthetic dataset that captures real-world complexities for incremental entity summarization, and provide an evaluation framework for LLMs on this task.
Findings
State-of-the-art LLMs struggle with IES, achieving F1 scores below 80.4%.
The dataset exposes issues like incorrect entity association and incomplete info.
High alignment (>96%) confirms dataset quality.
Abstract
No existing dataset adequately tests how well language models can incrementally update entity summaries - a crucial ability as these models rapidly advance. The Incremental Entity Summarization (IES) task is vital for maintaining accurate, up-to-date knowledge. To address this, we introduce SUMIE, a fully synthetic dataset designed to expose real-world IES challenges. This dataset effectively highlights problems like incorrect entity association and incomplete information presentation. Unlike common synthetic datasets, ours captures the complexity and nuances found in real-world data. We generate informative and diverse attributes, summaries, and unstructured paragraphs in sequence, ensuring high quality. The alignment between generated summaries and paragraphs exceeds 96%, confirming the dataset's quality. Extensive experiments demonstrate the dataset's difficulty - state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Management and Algorithms · Topic Modeling
