DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning

Deyi Li; Zijun Yao; Qi Xu; Muxuan Liang; Lingyao Li; Zijian Xu; Mei Liu

arXiv:2603.10180·cs.LG·March 12, 2026

DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning

Deyi Li, Zijun Yao, Qi Xu, Muxuan Liang, Lingyao Li, Zijian Xu, Mei Liu

PDF

Open Access 3 Reviews

TL;DR

DT-BEHRT introduces a novel graph-enhanced transformer model for EHR data that explicitly models disease trajectories and provides interpretable patient representations, improving predictive accuracy and clinical relevance.

Contribution

The paper presents a new disease trajectory-aware transformer architecture with a tailored pre-training method for better interpretability and performance in EHR-based patient modeling.

Findings

01

Achieves strong predictive performance on benchmark datasets.

02

Provides interpretable patient representations aligned with clinical reasoning.

03

Demonstrates robustness through a novel pre-training strategy.

Abstract

The growing adoption of electronic health record (EHR) systems has provided unprecedented opportunities for predictive modeling to guide clinical decision making. Structured EHRs contain longitudinal observations of patients across hospital visits, where each visit is represented by a set of medical codes. While sequence-based, graph-based, and graph-enhanced sequence approaches have been developed to capture rich code interactions over time or within the same visits, they often overlook the inherent heterogeneous roles of medical codes arising from distinct clinical characteristics and contexts. To this end, in this study we propose the Disease Trajectory-aware Transformer for EHR (DT-BEHRT), a graph-enhanced sequential architecture that disentangles disease trajectories by explicitly modeling diagnosis-centric interactions within organ systems and capturing asynchronous progression…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

S1. The paper conducts experiments on MIMICs across multiple tasks. Beyond quantitative metrics, the authors include patient-level case studies that qualitatively analyze model explanations. S2. Each module in DT-BEHRT, i.e., sequence, aggregation, progression, and patient representation, is clearly motivated from a clinician’s perspective, reflecting real-world medical reasoning. S3. The introduction of two pretraining tasks allows the model to fully leverage EHR data across visits and dis

Weaknesses

W1. The paper integrates design patterns from both Transformer-based and graph-based models, resulting in an architecture that appears more incremental. The combination of SR–DA–PR resembles conventional Transformer stacks, with the main variation being the use of ancestor node embeddings and customized losses. Similarly, the SR–DP–PR path largely parallels prior graph-based pipelines that model interactions between disease, visit, and patient nodes. W2. The proposed Global Code Masking and An

Reviewer 02Rating 4Confidence 5

Strengths

1. Demonstrated Modular Contribution: The framework consists of multiple modules (DA, DP) and a new pre-training task (ACP). A key strength is the use of an Ablation Study (Table 3 in the paper) to clearly demonstrate how much each component contributes to the model's performance improvement. 2. Clinical Interpretability: The model doesn't just aim for higher performance; it attempts to link the rationales for its predictions to clinical reasoning (e.g., problems in a specific organ system, tem

Weaknesses

1. Limited Dataset Validation: The datasets used for the experiment are limited to MIMIC-III and MIMIC-IV, which is insufficient to prove the model's generalization performance. EHR data has inherent biases depending on the hospital system, country, and ethnicity. Therefore, external validation on other large-scale ICU datasets (e.g., eICU, HiRID, UMCdb) is essential. 2. Lack of Prediction Task Diversity: The variety of prediction tasks performed is insufficient to claim that the proposed frame

Reviewer 03Rating 4Confidence 3

Strengths

* **Clinically-Aligned Architecture:** The model's DA and DP modules are designed to mirror clinical reasoning, enhancing interpretability. * **Novel Pre-training:** The Ancestor Code Prediction (ACP) task effectively aligns the model's different modules with ontology information. * **Strong Empirical Results:** DT-BEHRT outperforms baselines, especially on complex phenotyping and readmission tasks. * **Targeted Ablation:** Ablation studies demonstrate the distinct contributions of the DA and DP

Weaknesses

* **Ontology Dependence:** The Disease Aggregation (DA) module is explicitly tied to the ICD-9 ontology, which may not be adaptable. * **Fixed Aggregation Threshold:** DA tokens are activated by a fixed hyperparameter $k$, and the impact of this choice isn't explored. * **Incomplete Pre-train Ablation:** The ablation study does not isolate the effect of the DA token decorrelation loss ($l_{cov}$). * **Simplistic Code Roles:** The model simplifies code roles, treating diagnoses as interactive whi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Electronic Health Records Systems