Attribute or Abstain: Large Language Models as Long Document Assistants
Jan Buchmann, Xiao Liu, Iryna Gurevych

TL;DR
This paper introduces LAB, a benchmark for evaluating attribution methods in long document tasks using LLMs, revealing how different approaches impact trust and response quality.
Contribution
It provides the first long document-specific evaluation of attribution methods, comparing their effectiveness across various LLM sizes and tasks.
Findings
Citation-based attribution performs best for large, fine-tuned models.
Additional retrieval benefits small, prompted models.
Evidence quality predicts response quality for simple responses.
Abstract
LLMs can help humans working with long documents, but are known to hallucinate. Attribution can increase trust in LLM responses: The LLM provides evidence that supports its response, which enhances verifiability. Existing approaches to attribution have only been evaluated in RAG settings, where the initial retrieval confounds LLM performance. This is crucially different from the long document setting, where retrieval is not needed, but could help. Thus, a long document specific evaluation of attribution is missing. To fill this gap, we present LAB, a benchmark of 6 diverse long document tasks with attribution, and experiments with different approaches to attribution on 5 LLMs of different sizes. We find that citation, i.e. response generation and evidence extraction in one step, performs best for large and fine-tuned models, while additional retrieval can help for small, prompted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Adam · Dropout
