Leveraging Language Models and Machine Learning in Verbal Autopsy Analysis
Yue Chu

TL;DR
This paper demonstrates that pretrained language models using verbal autopsy narratives significantly improve cause of death classification accuracy, especially when combined with question data, informing better health policy in data-scarce settings.
Contribution
It introduces the use of transformer-based language models for verbal autopsy narratives and explores multimodal fusion strategies, advancing automated cause of death classification methods.
Findings
Narratives alone with PLMs outperform question-only algorithms.
Multimodal fusion improves classification accuracy.
Classification accuracy varies with information sufficiency levels.
Abstract
In countries without civil registration and vital statistics, verbal autopsy (VA) is a critical tool for estimating cause of death (COD) and inform policy priorities. In VA, interviewers ask proximal informants for details on the circumstances preceding a death, in the form of unstructured narratives and structured questions. Existing automated VA cause classification algorithms only use the questions and ignore the information in the narratives. In this thesis, we investigate how the VA narrative can be used for automated COD classification using pretrained language models (PLMs) and machine learning (ML) techniques. Using empirical data from South Africa, we demonstrate that with the narrative alone, transformer-based PLMs with task-specific fine-tuning outperform leading question-only algorithms at both the individual and population levels, particularly in identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
