# Sequential pattern transformer (SPT): a generative and interpretable framework for predicting disease trajectories

**Authors:** Mohammad Assadi Shalmani, Masoud Khani, Amirsajjad Taleban, Zihao Yi, Jennifer T. Fink, Christopher E. Weber, Qiang Lu, Jake Luo

PMC · DOI: 10.1007/s00521-025-11695-4 · Neural computing & applications · 2026-03-28

## TL;DR

This paper introduces SPT, a transparent AI framework that generates interpretable disease progression paths for type 2 diabetes patients using real-world data.

## Contribution

The novel SPT framework combines sequential pattern mining with generative transformers to create interpretable and accurate disease trajectory predictions.

## Key findings

- SPT achieved 85.78% Top-5 accuracy, outperforming an LSTM baseline by over 14%.
- The model generates interpretable Disease Atlas visualizations with explainable AI techniques like SHAP and counterfactuals.
- SPT is domain-agnostic and adaptable to diverse clinical settings through efficient fine-tuning.

## Abstract

The effective integration of artificial intelligence into clinical workflows requires models that go beyond simple prediction to generate comprehensive, explainable, and actionable disease trajectories. Addressing the limitations of opaque deep learning architectures and the noise inherent in electronic health records, we introduce the sequential pattern transformer (SPT), a novel framework that synergizes sequential pattern mining with generative transformer modeling. Using four years of inpatient data from 258,460 type 2 diabetes patients, we applied the PrefixSpan algorithm to distill noisy diagnostic histories into a curated vocabulary of 95,630 statistically validated disease progression patterns. A decoder-only transformer was trained exclusively on these evidence-based sequences to learn the temporal dynamics of disease evolution. This pattern-guided approach shifts the modeling paradigm from classification to probabilistic trajectory generation. The model achieved a robust 85.78% Top-5 accuracy, significantly outperforming a standard LSTM baseline (71.47%). Beyond predictive accuracy, the framework constructs a dynamic Disease Atlas, a branching tree structure that visualizes likely future pathways, augmented by multi-level explainable AI (XAI) including learned clinical clusters, SHAP-based feature attribution, and counterfactual simulations. Crucially, this methodology is domain-agnostic and capable of efficient fine-tuning, making it a transferable solution for adapting to diverse clinical conditions and local hospital settings. SPT thus offers a transparent, robust, and scalable framework for mapping the complex temporal dynamics of disease, bridging the gap between high-performance AI and interpretable clinical application.

## Linked entities

- **Diseases:** type 2 diabetes (MONDO:0005148)

## Full-text entities

- **Genes:** AGXT (alanine--glyoxylate aminotransferase) [NCBI Gene 189] {aka AGT, AGT1, AGXT1, PH1, SPAT, SPT}, TOP1 (DNA topoisomerase I) [NCBI Gene 7150] {aka TOPI}, SLC5A2 (solute carrier family 5 member 2) [NCBI Gene 6524] {aka SGLT2}
- **Diseases:** COPD (MESH:D029424), atherosclerosis (MESH:D050197), Diabetes (MESH:D003920), Coronary Atherosclerosis (MESH:D003324), Osteoarthritis (MESH:D010003), muscle (MESH:D019042), , renal, and cardiovascular complications (MESH:D002318), anemia (MESH:D000740), Disorder of Lipid Metabolism (MESH:D052439), disease (MESH:D004194), sarcopenia (MESH:D055948), diabetes complications (MESH:D048909), heart disease (MESH:D006331), pneumonia (MESH:D011014), deaths (MESH:D003643), cardiovascular-related kidney damage (MESH:D007674), Essential Hypertension (MESH:D000075222), obesity (MESH:D009765), type 2 diabetes (MESH:D003924), SID (MESH:D018458), infection (MESH:D007239), gastroesophageal disease (MESH:D005764), Chronic Kidney Disease (MESH:D051436), Esophageal disorders (MESH:D004941), inflammatory (MESH:D007249), Fluid (MESH:D002559), heart failure (MESH:D006333), infectious disease (MESH:D003141), Fever (MESH:D005334), Electrolyte Disorders (MESH:D014883), neuro-cognitive complications (MESH:D000079690), renal, cardiovascular, and metabolic dysfunction (MESH:D024821), sepsis (MESH:D018805), CKD (MESH:D003928), CCSR (MESH:D008310), Acute Renal Failure (MESH:D058186), uremic (MESH:D006463), Chronic non-communicable diseases (MESH:D000073296), cardiorenal metabolic syndrome (MESH:D059347)
- **Chemicals:** GLP-1RAs (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030885/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030885/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030885/full.md

---
Source: https://tomesphere.com/paper/PMC13030885