Direct Preference Optimization for Adaptive Concept-based Explanations
Jacopo Teneggi, Zhenzhen Wang, Paul H. Yi, Tianmin Shu, Jeremias Sulam

TL;DR
This paper introduces a listener-adaptive explanation method for machine learning models that uses preference optimization to generate more effective, context-aware explanations, improving human understanding and classification accuracy.
Contribution
It presents a novel iterative training approach that aligns explanations with listener preferences using pairwise feedback, enhancing interpretability in real-world scenarios.
Findings
Aligns explanations with simulated listener preferences
Improves human classification accuracy in user studies
Effective across multiple image classification datasets
Abstract
Concept-based explanation methods aim at making machine learning models more transparent by finding the most important semantic features of an input (e.g., colors, patterns, shapes) for a given prediction task. However, these methods generally ignore the communicative context of explanations, such as the preferences of a listener. For example, medical doctors understand explanations in terms of clinical markers, but patients may not, needing a different vocabulary to rationalize the same diagnosis. We address this gap with listener-adaptive explanations grounded in principles of pragmatic reasoning and the rational speech act. We introduce an iterative training procedure based on direct preference optimization where a speaker learns to compose explanations that maximize communicative utility for a listener. Our approach only needs access to pairwise preferences, which can be collected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
