TL;DR
This paper introduces a model-agnostic framework called MEED that improves instance-wise feature selection for model interpretation by using adversarial infidelity learning and integrating prior interpretation methods, validated through extensive experiments.
Contribution
The paper proposes a novel MEED framework with AIL mechanism for more accurate, efficient, and robust model interpretation, addressing key challenges in feature importance explanation.
Findings
AIL enhances feature selection accuracy.
MEED outperforms existing interpretation methods.
Framework is validated by quantitative and human evaluations.
Abstract
Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFeature Selection · Generative Adversarial Imitation Learning
