Interpreting convolutional networks trained on textual data
Reza Marzban, Christopher John Crick

TL;DR
This paper investigates interpretability of convolutional neural networks trained on text data by identifying key words that drive model decisions, leading to more efficient and understandable NLP models.
Contribution
It introduces a method to analyze CNNs on text, identifies important words influencing decisions, and demonstrates that models trained on these words perform as well as full models.
Findings
Key words account for 95% of model logic
Models trained on top 5% words match original performance
Approach enhances understanding and efficiency of NLP models
Abstract
There have been many advances in the artificial intelligence field due to the emergence of deep learning. In almost all sub-fields, artificial neural networks have reached or exceeded human-level performance. However, most of the models are not interpretable. As a result, it is hard to trust their decisions, especially in life and death scenarios. In recent years, there has been a movement toward creating explainable artificial intelligence, but most work to date has concentrated on image processing models, as it is easier for humans to perceive visual patterns. There has been little work in other fields like natural language processing. In this paper, we train a convolutional model on textual data and analyze the global logic of the model by studying its filter values. In the end, we find the most important words in our corpus to our models logic and remove the rest (95%). New models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
