Aligned explanations in neural networks

Corentin Lobet; Francesca Chiaromonte

arXiv:2601.04378·cs.LG·May 8, 2026

Aligned explanations in neural networks

Corentin Lobet, Francesca Chiaromonte

PDF

TL;DR

This paper introduces PiNets, a pseudo-linear neural network architecture that produces explanations directly aligned with predictions, enhancing trustworthiness in AI systems.

Contribution

The paper proposes PiNets, a novel pseudo-linear architecture that ensures explanations are faithful and directly aligned with model predictions in complex data domains.

Findings

01

PiNets produce explanations that are meaningful, aligned, robust, and sufficient.

02

PiNets demonstrate deep faithfulness in image classification and segmentation tasks.

03

The approach bridges deep learning's predictive power with linear models' interpretability.

Abstract

As artificial intelligence increasingly drives critical decisions, the ability to genuinely explain how neural networks make predictions is essential for trust. Yet, most current explanation methods offer post-hoc rationalizations rather than guaranteeing a true reflection of the model's reasoning. We introduce the notion of explanatory alignment, a requirement that explanations directly construct predictions rather than rationalize them. To achieve this in complex data domains, we present Pointwise-interpretable Networks (PiNets), a pseudo-linear architecture that forms linear models instance-wise. Evaluated on image classification and segmentation tasks, PiNets demonstrate that their explanations are deeply faithful across four criteria: meaningfulness, alignment, robustness, and sufficiency (MARS). Our contributions pave the way for promising avenues: by reconciling the predictive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.