Barriers to Counterfactual Credit Attribution for Autoregressive Models
Aloni Cohen, Chenhao Zhang

TL;DR
This paper explores the challenges of achieving counterfactual credit attribution in autoregressive generative models, revealing fundamental barriers and limitations of existing approaches.
Contribution
It identifies key barriers to implementing counterfactual credit attribution in autoregressive models, including non-compositionality and exponential query complexity in retrofitting methods.
Findings
CCA does not compose autoregressively, unlike differential privacy.
Imposing CCA on the predictor does not guarantee the model is CCA.
Retrofitting for CCA requires exponential query complexity in output length.
Abstract
Generative AI disrupts the practice of giving credit to work that came before. Ideally, a generative model would give credit to any work on which its output depends in a significant way. \emph{Counterfactual credit attribution} (CCA) is a technical condition formalizing this goal--a relaxation of differential privacy--recently introduced by Livni, Moran, Nissim, and Pabbaraju [2024] who studied it in the PAC learning setting. We initiate the study of CCA generative models. Specifically, we consider autoregressive models giving credit to a deployment-time dataset (e.g., a RAG database). We uncover barriers to two natural approaches to CCA autoregressive models. First, we show that imposing CCA on the underlying next-token predictor does not guarantee that the model is CCA: CCA does not compose autoregressively (unlike DP). Second, we consider a different approach to building CCA models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
