L-CiteEval: Do Long-Context Models Truly Leverage Context for   Responding?

Zecheng Tang; Keyan Zhou; Juntao Li; Baibei Ji; Jianye Hou; Min Zhang

arXiv:2410.02115·cs.CL·October 7, 2024

L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?

Zecheng Tang, Keyan Zhou, Juntao Li, Baibei Ji, Jianye Hou, Min Zhang

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces L-CiteEval, a comprehensive benchmark for assessing long-context models' understanding and faithfulness across diverse tasks, revealing current models' limitations in citation accuracy and the benefits of retrieval-augmented generation.

Contribution

L-CiteEval provides a multi-task, fully automated benchmark for evaluating long-context models' understanding and faithfulness, highlighting the gap between open-source and closed-source models.

Findings

01

Open-source models lag behind closed-source in citation accuracy.

02

RAG improves faithfulness but slightly reduces generation quality.

03

Attention mechanisms correlate with citation generation processes.

Abstract

Long-context models (LCMs) have made remarkable strides in recent years, offering users great convenience for handling tasks that involve long context, such as document summarization. As the community increasingly prioritizes the faithfulness of generated results, merely ensuring the accuracy of LCM outputs is insufficient, as it is quite challenging for humans to verify the results from the extremely lengthy context. Yet, although some efforts have been made to assess whether LCMs respond truly based on the context, these works either are limited to specific tasks or heavily rely on external evaluation resources like GPT4.In this work, we introduce L-CiteEval, a comprehensive multi-task benchmark for long-context understanding with citations, aiming to evaluate both the understanding capability and faithfulness of LCMs. L-CiteEval covers 11 tasks from diverse domains, spanning context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Jonaszky123/L-CiteEval
dataset· 506 dl
506 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Attention Dropout · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Dropout · Byte Pair Encoding · BERT