Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks
Ximing Wen

TL;DR
This paper introduces a novel approach combining prototypical networks with transformer-based language models to create inherently interpretable text classification models that maintain high accuracy.
Contribution
It proposes a white-box multi-head graph attention prototype network and extends it with contrastive learning for improved interpretability and performance.
Findings
Enhanced interpretability through instance-based explanations.
Maintained high classification accuracy with the proposed models.
Improved document classification performance with contrastive learning.
Abstract
Pretrained transformer-based Language Models (LMs) are well-known for their ability to achieve significant improvement on NLP tasks, but their black-box nature, which leads to a lack of interpretability, has been a major concern. My dissertation focuses on developing intrinsically interpretable models when using LMs as encoders while maintaining their superior performance via prototypical networks. I initiated my research by investigating enhancements in performance for interpretable models of sarcasm detection. My proposed approach focuses on capturing sentiment incongruity to enhance accuracy while offering instance-based explanations for the classification decisions. Later, I developed a novel white-box multi-head graph attention-based prototype network designed to explain the decisions of text classification models without sacrificing the accuracy of the original black-box LMs. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsContrastive Learning · Attention Model
