Extend and Explain: Interpreting Very Long Language Models
Joel Stremmel, Brian L. Hill, Jeffrey Hertzberg, Jaime Murillo,, Llewelyn Allotey, Eran Halperin

TL;DR
This paper introduces MSP, a novel method for interpreting long transformer language models, especially in medical contexts, by efficiently identifying influential text segments and outperforming previous explainability techniques in speed and informativeness.
Contribution
We propose MSP, a new explainability technique for long LMs that improves speed and accuracy in identifying relevant text segments, validated in medical diagnosis prediction.
Findings
MSP identifies 1.7x more informative text blocks than previous methods.
MSP runs up to 100x faster than existing explainability algorithms.
MSP is applicable to any text classifier, especially long LMs.
Abstract
While Transformer language models (LMs) are state-of-the-art for information extraction, long text introduces computational challenges requiring suboptimal preprocessing steps or alternative model architectures. Sparse attention LMs can represent longer sequences, overcoming performance hurdles. However, it remains unclear how to explain predictions from these models, as not all tokens attend to each other in the self-attention layers, and long sequences pose computational challenges for explainability algorithms when runtime depends on document length. These challenges are severe in the medical context where documents can be very long, and machine learning (ML) models must be auditable and trustworthy. We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction, apply MSP in the context of predicting diagnoses from medical text, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing
