Propositional Interpretability in Artificial Intelligence

David J. Chalmers

arXiv:2501.15740·cs.AI·January 28, 2025·2 cites

Propositional Interpretability in Artificial Intelligence

David J. Chalmers

PDF

Open Access

TL;DR

This paper advocates for propositional interpretability in AI, emphasizing the importance of understanding AI mechanisms through propositional attitudes, and discusses challenges like thought logging and evaluates current interpretability methods.

Contribution

It introduces propositional interpretability as a key framework for AI explanation and analyzes existing methods and philosophical approaches for their effectiveness in this context.

Findings

01

Propositional attitudes are central to human and AI interpretability.

02

Thought logging is a critical challenge for propositional interpretability.

03

Current interpretability methods have specific strengths and weaknesses.

Abstract

Mechanistic interpretability is the program of explaining what AI systems are doing in terms of their internal mechanisms. I analyze some aspects of the program, along with setting out some concrete challenges and assessing progress to date. I argue for the importance of propositional interpretability, which involves interpreting a system's mechanisms and behavior in terms of propositional attitudes: attitudes (such as belief, desire, or subjective probability) to propositions (e.g. the proposition that it is hot outside). Propositional attitudes are the central way that we interpret and explain human beings and they are likely to be central in AI too. A central challenge is what I call thought logging: creating systems that log all of the relevant propositional attitudes in an AI system over time. I examine currently popular methods of interpretability (such as probing, sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)