# Learning Covariate Relations in Disease Progression Models Using Symbolic Neural Networks

**Authors:** Jesper Sundell, Ylva Wahlquist, Maria C. Kjellsson, Mats O. Karlsson, Kristian Soltesz

PMC · DOI: 10.1002/psp4.70214 · 2026-03-10

## TL;DR

This paper introduces a new method using symbolic neural networks to improve disease progression models by automatically identifying covariate relations without predefined functions.

## Contribution

The novel method uses symbolic neural networks to automate covariate model identification in disease progression models, avoiding predefined parametric functions.

## Key findings

- The method produces human-readable covariate functions by pruning dense symbolic networks.
- The resulting model achieves similar predictive performance as state-of-the-art models but uses fewer covariates.

## Abstract

Covariate modeling provides individual predictions of outcomes by disease progression models. Current methodology for mapping covariates onto model parameters is limited by predefined parametric functions which can result in inadequate covariate selection and biased predictions by the final model. Furthermore, present methodology scales poorly to high‐dimensional data due to combinatorial limitations. In the present study, a novel method for automation of covariate model identification in disease progression models is described. Symbolic neural networks are used to simultaneously identify the parametric covariate functions and optimize model parameters of a Markov chain. By stepwise pruning of initially fully connected dense symbolic networks, humanly readable functions representing the covariate relations are produced. The presented methodology is applied to a dataset containing disease progression observations for type 2 diabetes patients. Although utilizing fewer covariates, the resulting model demonstrates predictive performance similar to that of a model which was developed on the same data using state‐of‐the‐art modeling methodology.

What is the current knowledge on the topic?
○Current covariate model development tools are limited by predefined parametric functions which can result in inadequate covariate selection and biased predictions by the final model. Furthermore, present methodology scales poorly to high‐dimensional data due to combinatorial limitations.
What question did this study address?
○This study describes a novel covariate modeling method for disease progression models, which do not rely on predefined parametric functions. The method is applied to a type 2 diabetes disease progression dataset.
What does this study add to our knowledge?
○Application of the method results in a model which displays similar predictive performance as a model which was developed on the same dataset using state‐of‐the‐art modeling practice although utilizing fewer covariates.
How might this change drug discovery, development, and/or therapeutics?
○The method provides automatic and less constrained identification of covariate models, ultimately providing disease progression models which may more accurately predict individual risks.

What is the current knowledge on the topic?
○Current covariate model development tools are limited by predefined parametric functions which can result in inadequate covariate selection and biased predictions by the final model. Furthermore, present methodology scales poorly to high‐dimensional data due to combinatorial limitations.

Current covariate model development tools are limited by predefined parametric functions which can result in inadequate covariate selection and biased predictions by the final model. Furthermore, present methodology scales poorly to high‐dimensional data due to combinatorial limitations.

What question did this study address?
○This study describes a novel covariate modeling method for disease progression models, which do not rely on predefined parametric functions. The method is applied to a type 2 diabetes disease progression dataset.

This study describes a novel covariate modeling method for disease progression models, which do not rely on predefined parametric functions. The method is applied to a type 2 diabetes disease progression dataset.

What does this study add to our knowledge?
○Application of the method results in a model which displays similar predictive performance as a model which was developed on the same dataset using state‐of‐the‐art modeling practice although utilizing fewer covariates.

Application of the method results in a model which displays similar predictive performance as a model which was developed on the same dataset using state‐of‐the‐art modeling practice although utilizing fewer covariates.

How might this change drug discovery, development, and/or therapeutics?
○The method provides automatic and less constrained identification of covariate models, ultimately providing disease progression models which may more accurately predict individual risks.

The method provides automatic and less constrained identification of covariate models, ultimately providing disease progression models which may more accurately predict individual risks.

## Linked entities

- **Diseases:** type 2 diabetes (MONDO:0005148)

## Full-text entities

- **Diseases:** type 2 diabetes (MESH:D003924)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13014801/full.md

---
Source: https://tomesphere.com/paper/PMC13014801