Improving Interpretability via Explicit Word Interaction Graph Layer
Arshdeep Sekhon, Hanjie Chen, Aman Shrivastava, Zhe Wang, Yangfeng Ji,, Yanjun Qi

TL;DR
This paper introduces WIGRAPH, a trainable neural network layer that learns word interaction graphs to improve interpretability and prediction accuracy of NLP models, applicable across various architectures and datasets.
Contribution
The paper presents WIGRAPH, a novel layer that explicitly models word interactions to enhance interpretability and performance in neural NLP classifiers.
Findings
WIGRAPH improves interpretability of NLP models.
Adding WIGRAPH enhances prediction accuracy.
WIGRAPH is compatible with multiple NLP architectures.
Abstract
Recent NLP literature has seen growing interest in improving model interpretability. Along this direction, we propose a trainable neural network layer that learns a global interaction graph between words and then selects more informative words using the learned word interactions. Our layer, we call WIGRAPH, can plug into any neural network-based NLP text classifiers right after its word embedding layer. Across multiple SOTA NLP models and various NLP datasets, we demonstrate that adding the WIGRAPH layer substantially improves NLP models' interpretability and enhances models' prediction performance at the same time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning in Healthcare
