CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models
Yike Sun, Mingkun Xu, Mu You, Zhongzhi He, Henghua Shen, Zehan Tan, Derek F. Wong, and Tao Fang

TL;DR
This paper introduces CLIF, a method using influence functions to improve interpretability of NLP models at sample and concept levels, aiding data debugging and understanding model decisions.
Contribution
It presents a novel influence function approach for interpretability at both sample and concept levels in NLP models, enabling efficient data debugging and model understanding.
Findings
Influence functions identify impactful training samples effectively.
Adjusting sample labels and weights restores model performance without retraining.
Concept-level analysis reveals key concepts affecting model predictions.
Abstract
In recent years, the black-box nature of deep learning models has limited their application in high-stakes domains such as medical diagnosis and finance, where interpretability is essential. To address this, we propose a novel approach using influence functions to enhance interpretability in NLP models at both the sample and concept levels. Experiments on CEBaB and Yelp datasets show that influence functions effectively identify the most impactful training samples, both helpful and harmful, on model predictions. By adjusting the labels and weights of these samples, we demonstrate that model performance can be restored to baseline levels without retraining, confirming the value of influence functions for efficient data debugging. Furthermore, our concept-level analysis identifies key concepts within Concept Bottleneck Models (CBM) that significantly affect predictions. Modifying these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
