XNNTab -- Interpretable Neural Networks for Tabular Data using Sparse Autoencoders
Khawla Elhadri, J\"org Schl\"otterer, Christin Seifert

TL;DR
XNNTab is a neural network architecture that combines high predictive performance with interpretability for tabular data by using sparse autoencoders to create human-understandable features.
Contribution
It introduces a novel neural model that learns interpretable features, bridging the gap between blackbox neural networks and traditional interpretable models.
Findings
XNNTab outperforms other interpretable models.
XNNTab achieves comparable accuracy to non-interpretable neural networks.
The model effectively decomposes features into human-understandable concepts.
Abstract
In data-driven applications relying on tabular data, where interpretability is key, machine learning models such as decision trees and linear regression are applied. Although neural networks can provide higher predictive performance, they are not used because of their blackbox nature. In this work, we present XNNTab, a neural architecture that combines the expressiveness of neural networks and interpretability. XNNTab first learns highly non-linear feature representations, which are decomposed into monosemantic features using a sparse autoencoder (SAE). These features are then assigned human-interpretable concepts, making the overall model prediction intrinsically interpretable. XNNTab outperforms interpretable predictive models, and achieves comparable performance to its non-interpretable counterparts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
