Aligned explanations in neural networks
Corentin Lobet, Francesca Chiaromonte

TL;DR
This paper introduces PiNets, a pseudo-linear neural network architecture that produces explanations directly aligned with predictions, enhancing trustworthiness in AI systems.
Contribution
The paper proposes PiNets, a novel pseudo-linear architecture that ensures explanations are faithful and directly aligned with model predictions in complex data domains.
Findings
PiNets produce explanations that are meaningful, aligned, robust, and sufficient.
PiNets demonstrate deep faithfulness in image classification and segmentation tasks.
The approach bridges deep learning's predictive power with linear models' interpretability.
Abstract
As artificial intelligence increasingly drives critical decisions, the ability to genuinely explain how neural networks make predictions is essential for trust. Yet, most current explanation methods offer post-hoc rationalizations rather than guaranteeing a true reflection of the model's reasoning. We introduce the notion of explanatory alignment, a requirement that explanations directly construct predictions rather than rationalize them. To achieve this in complex data domains, we present Pointwise-interpretable Networks (PiNets), a pseudo-linear architecture that forms linear models instance-wise. Evaluated on image classification and segmentation tasks, PiNets demonstrate that their explanations are deeply faithful across four criteria: meaningfulness, alignment, robustness, and sufficiency (MARS). Our contributions pave the way for promising avenues: by reconciling the predictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
