# Bayesian Tensor Decomposition for Clustering Latent Symptom Profiles for Verbal Autopsy Data

**Authors:** Yu Zhu, Zehang Richard Li

PMC · DOI: 10.1002/sim.70475 · Statistics in Medicine · 2026-03-03

## TL;DR

This paper introduces a new Bayesian method to better understand and categorize symptom patterns in verbal autopsy data for determining causes of death.

## Contribution

A novel Bayesian tensor decomposition framework that improves accuracy and interpretability in clustering symptom profiles for verbal autopsies.

## Key findings

- The proposed method outperforms existing VA methods in predictive accuracy.
- It provides a more interpretable and parsimonious representation of symptom distributions.
- New insights into symptom and cause clustering patterns are revealed using the PHMRC dataset.

## Abstract

Cause‐of‐death data is fundamental for understanding population health trends and inequalities as well as designing and evaluating public health interventions. A significant proportion of global deaths, particularly in low‐ and middle‐income countries (LMICs), do not have medically certified causes assigned. In such settings, verbal autopsy (VA) is a widely adopted approach to estimate disease burdens by interviewing caregivers of the deceased. Recently, latent class models have been developed to model the joint distribution of symptoms and perform probabilistic cause‐of‐death assignment. A large number of latent classes are usually needed in order to characterize the complex dependence among symptoms, making the estimated symptom profiles challenging to summarize and interpret. In this paper, we propose a flexible Bayesian tensor decomposition framework that balances the predictive accuracy of the cause‐of‐death assignment task and the interpretability of the latent structures. The key to our approach is to partition symptoms into groups and model the joint distributions of group‐level symptom sub‐profiles. The proposed methods achieve better predictive accuracy than existing VA methods and provide a more parsimonious representation of the symptom distributions. We show our methods provide new insights into the clustering patterns of both symptoms and causes using the PHMRC gold‐standard VA dataset.

## Full-text entities

- **Diseases:** fever (MESH:D005334), paralysis (MESH:D010243), Stroke (MESH:D020521), pneumonia (MESH:D011014), AIDS (MESH:D000163), cough (MESH:D003371), loss of consciousness (MESH:D014474), breathing problems (MESH:D004417), rare (MESH:D035583), cause (MESH:C535944), Death (MESH:D003643), VA (MESH:D001039), Symptom (MESH:D012816)
- **Chemicals:** VA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12956427/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12956427/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12956427/full.md

---
Source: https://tomesphere.com/paper/PMC12956427