ContextCite: Attributing Model Generation to Context

Benjamin Cohen-Wang; Harshay Shah; Kristian Georgiev; Aleksander Madry

arXiv:2409.00729·cs.LG·September 17, 2024·2 cites

ContextCite: Attributing Model Generation to Context

Benjamin Cohen-Wang, Harshay Shah, Kristian Georgiev, Aleksander Madry

PDF

Open Access 1 Repo

TL;DR

This paper introduces ContextCite, a scalable method for attributing language model outputs to specific parts of their input context, aiding in verification, response improvement, and attack detection.

Contribution

The paper presents a novel, scalable approach called ContextCite for identifying which parts of context influence model outputs, applicable to any language model.

Findings

01

Effective in verifying generated statements

02

Improves response quality by context pruning

03

Detects poisoning attacks

Abstract

How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

madrylab/context-cite
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Semantic Web and Ontologies

MethodsPruning