Learning with Interpretable Structure from Gated RNN
Bo-Jian Hou, Zhi-Hua Zhou

TL;DR
This paper explores extracting finite state automata from gated RNNs to improve interpretability, demonstrating that FSAs are more trustworthy and can reveal how RNNs process text classification.
Contribution
It introduces two methods for learning FSAs from RNNs, providing a more interpretable structure and insights into RNN inner mechanisms.
Findings
FSA learned from RNNs is more trustable than the original RNN.
Fewer gates in RNNs can still maintain performance, guiding RNN design.
FSA states correspond to semantic concepts in text classification.
Abstract
The interpretability of deep learning models has raised extended attention these years. It will be beneficial if we can learn an interpretable structure from deep learning models. In this paper, we focus on Recurrent Neural Networks~(RNNs) especially gated RNNs whose inner mechanism is still not clearly understood. We find that Finite State Automaton~(FSA) that processes sequential data has more interpretable inner mechanism according to the definition of interpretability and can be learned from RNNs as the interpretable structure. We propose two methods to learn FSA from RNN based on two different clustering methods. With the learned FSA and via experiments on artificial and real datasets, we find that FSA is more trustable than the RNN from which it learned, which gives FSA a chance to substitute RNNs in applications involving humans' lives or dangerous facilities. Besides, we analyze…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning in Materials Science · Topic Modeling
MethodsInterpretability
