MultiSense-Pneumo: A Multimodal Learning Framework for Pneumonia Screening in Resource-Constrained Settings
Dineth Jayakody, Pasindu Thenahandi, Chameli Dommanige

TL;DR
MultiSense-Pneumo is a multimodal framework combining symptoms, cough audio, speech, and radiographs for pneumonia screening in resource-limited settings, emphasizing transparency and offline capability.
Contribution
It introduces a modular, interpretable multimodal system optimized for low-resource environments, integrating diverse data sources for pneumonia screening.
Findings
Radiograph analysis remains robust under domain shifts.
The system operates fully offline on standard hardware.
Minority class recall for acoustic signals needs improvement.
Abstract
Pneumonia remains a leading global cause of morbidity and mortality, particularly in low resource settings where access to imaging, laboratory testing, and specialist care is limited. Clinical assessment relies on heterogeneous evidence, including symptoms, respiratory patterns, and chest imaging, making screening inherently multimodal. However, many existing computational approaches remain unimodal and focus primarily on radiographs. In this work, we present MultiSense-Pneumo, a multimodal framework for pneumonia oriented screening and triage support that integrates structured symptom descriptors, cough audio, spoken language, and chest radiographs. The system combines deterministic symptom triage, LightGBM based acoustic classification, domain adversarial radiograph analysis using ResNet 18, transformer based speech recognition, and an interpretable multimodal fusion operator. Each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
