Interpreting convolutional networks trained on textual data

Reza Marzban; Christopher John Crick

arXiv:2010.13585·cs.CL·March 19, 2021

Interpreting convolutional networks trained on textual data

Reza Marzban, Christopher John Crick

PDF

TL;DR

This paper investigates interpretability of convolutional neural networks trained on text data by identifying key words that drive model decisions, leading to more efficient and understandable NLP models.

Contribution

It introduces a method to analyze CNNs on text, identifies important words influencing decisions, and demonstrates that models trained on these words perform as well as full models.

Findings

01

Key words account for 95% of model logic

02

Models trained on top 5% words match original performance

03

Approach enhances understanding and efficiency of NLP models

Abstract

There have been many advances in the artificial intelligence field due to the emergence of deep learning. In almost all sub-fields, artificial neural networks have reached or exceeded human-level performance. However, most of the models are not interpretable. As a result, it is hard to trust their decisions, especially in life and death scenarios. In recent years, there has been a movement toward creating explainable artificial intelligence, but most work to date has concentrated on image processing models, as it is easier for humans to perceive visual patterns. There has been little work in other fields like natural language processing. In this paper, we train a convolutional model on textual data and analyze the global logic of the model by studying its filter values. In the end, we find the most important words in our corpus to our models logic and remove the rest (95%). New models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.