DoLFIn: Distributions over Latent Features for Interpretability

Phong Le; Willem Zuidema

arXiv:2011.05295·cs.CL·November 11, 2020

DoLFIn: Distributions over Latent Features for Interpretability

Phong Le, Willem Zuidema

PDF

Open Access

TL;DR

DoLFIn introduces a novel interpretability method for neural networks that uses distributions over latent features, enabling straightforward explanations and slightly improved performance in text classification tasks.

Contribution

The paper presents DoLFIn, a new architecture that models features as an unordered set with associated probabilities, enhancing interpretability without sacrificing model performance.

Findings

01

DoLFIn provides clear probability-based explanations for model decisions.

02

It slightly outperforms classical CNN and BiLSTM models on SST2 and AG-news datasets.

03

The approach maintains interpretability while achieving competitive accuracy.

Abstract

Interpreting the inner workings of neural models is a key step in ensuring the robustness and trustworthiness of the models, but work on neural network interpretability typically faces a trade-off: either the models are too constrained to be very useful, or the solutions found by the models are too complex to interpret. We propose a novel strategy for achieving interpretability that -- in our experiments -- avoids this trade-off. Our approach builds on the success of using probability as the central quantity, such as for instance within the attention mechanism. In our architecture, DoLFIn (Distributions over Latent Features for Interpretability), we do no determine beforehand what each feature represents, and features go altogether into an unordered set. Each feature has an associated probability ranging from 0 to 1, weighing its importance for further processing. We show that, unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling

MethodsInterpretability · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Bidirectional LSTM