Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification
Yuqi Si, Kirk Roberts

TL;DR
This paper introduces a three-level hierarchical transformer network designed to effectively model long clinical notes and multiple documents for patient prediction tasks, significantly extending input length capabilities.
Contribution
The paper proposes a novel three-level hierarchical transformer architecture that captures dependencies across words, sentences, notes, and patients, improving long-sequence clinical document classification.
Findings
Outperforms state-of-the-art models like BigBird on MIMIC-III.
Handles longer input sequences than traditional BERT.
Optimized hyper-parameters for computational efficiency.
Abstract
We present a Three-level Hierarchical Transformer Network (3-level-HTN) for modeling long-term dependencies across clinical notes for the purpose of patient-level prediction. The network is equipped with three levels of Transformer-based encoders to learn progressively from words to sentences, sentences to notes, and finally notes to patients. The first level from word to sentence directly applies a pre-trained BERT model as a fully trainable component. While the second and third levels both implement a stack of transformer-based encoders, before the final patient representation is fed into a classification layer for clinical predictions. Compared to conventional BERT models, our model increases the maximum input length from 512 tokens to much longer sequences that are appropriate for modeling large numbers of clinical notes. We empirically examine different hyper-parameters to identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · AI in cancer detection · Topic Modeling
MethodsMulti-Head Attention · Linear Layer · BigBird · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Label Smoothing · Byte Pair Encoding · Dropout · Adam
