TL;DR
This paper presents an interpretable machine learning approach for understanding text document classification by tracing decisions back to individual words using layer-wise relevance propagation, enhancing explainability without explicit semantic extraction.
Contribution
It adapts layer-wise relevance propagation to word-based models, enabling interpretation of classification decisions and introduces a new measure of model explainability based on relevance scores.
Findings
CNN exhibits higher explainability than SVM despite similar accuracy
Relevance scores effectively identify words contributing to classification
Generated document vectors capture semantic information
Abstract
Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
