Unstructured Evidence Attribution for Long Context Query Focused Summarization
Dustin Wright, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein, and David Jurgens

TL;DR
This paper introduces a new method for extracting unstructured evidence spans to improve the relevance and factual consistency of summaries generated by large language models, especially for long contexts.
Contribution
The authors propose a novel dataset and pipeline for training models to extract unstructured evidence, enhancing summarization quality over fixed granularity methods.
Findings
LLMs with SUnsET produce more relevant evidence
Models extract evidence from diverse context locations
Enhanced summaries with better factual consistency
Abstract
Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query, and extracting and citing evidence spans helps improve the trustworthiness of these summaries. Whereas previous work has focused on evidence citation with fixed levels of granularity (e.g. sentence, paragraph, document, etc.), we propose to extract unstructured (i.e., spans of any length) evidence in order to acquire more relevant and consistent evidence than in the fixed granularity case. We show how existing systems struggle to copy and properly cite unstructured evidence, which also tends to be "lost-in-the-middle". To help models perform this task, we create the Summaries with Unstructured Evidence Text dataset (SUnsET), a synthetic dataset generated using a novel pipeline, which can be used as training supervision for unstructured evidence summarization. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Quality and Management · Data Mining Algorithms and Applications
MethodsBalanced Selection
