Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency
Md Abdul Kadir, Gowtham Krishna Addluri, Daniel Sonntag

TL;DR
This paper investigates how to harmonize feature attribution methods across different deep learning architectures like CNNs and transformers to improve interpretability and consistency of explanations in machine learning models.
Contribution
It introduces a method for harmonizing feature attributions across diverse architectures, enhancing the reliability of local explanations in deep learning models.
Findings
Harmonized attributions improve interpretability across architectures
Method increases consistency of feature importance explanations
Enhances trust in model predictions regardless of architecture
Abstract
Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of model predictions by attributing importance to individual input features. This study examines the generalization of feature attributions across various deep learning architectures, such as convolutional neural networks (CNNs) and vision transformers. We aim to assess the feasibility of utilizing a feature attribution method as a future detector and examine how these features can be harmonized across multiple models employing distinct architectures but trained on the same data distribution. By exploring this harmonization, we aim to develop a more coherent and optimistic understanding of feature attributions, enhancing the consistency of local explanations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Materials Science
