PulmoFusion: Advancing Pulmonary Health with Efficient Multi-Modal Fusion
Ahmed Sharshar, Yasser Attia, Mohammad Yaqub, Mohsen Guizani

TL;DR
This paper introduces PulmoFusion, a non-invasive multimodal approach using energy-efficient neural networks and attention mechanisms to accurately assess pulmonary function from video and metadata, improving remote respiratory monitoring.
Contribution
It presents a novel multimodal fusion framework with SNNs and lightweight CNNs for pulmonary assessment, achieving state-of-the-art accuracy and robustness in non-invasive respiratory monitoring.
Findings
92% accuracy with thermal data on breathing cycles
Relative RMSE of 0.11 for PEF regression with thermal data
MAE of 4.52% for FEV1/FVC predictions
Abstract
Traditional remote spirometry lacks the precision required for effective pulmonary monitoring. We present a novel, non-invasive approach using multimodal predictive models that integrate RGB or thermal video data with patient metadata. Our method leverages energy-efficient Spiking Neural Networks (SNNs) for the regression of Peak Expiratory Flow (PEF) and classification of Forced Expiratory Volume (FEV1) and Forced Vital Capacity (FVC), using lightweight CNNs to overcome SNN limitations in regression tasks. Multimodal data integration is improved with a Multi-Head Attention Layer, and we employ K-Fold validation and ensemble learning to boost robustness. Using thermal data, our SNN models achieve 92% accuracy on a breathing-cycle basis and 99.5% patient-wise. PEF regression models attain Relative RMSEs of 0.11 (thermal) and 0.26 (RGB), with an MAE of 4.52% for FEV1/FVC predictions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLung Cancer Diagnosis and Treatment
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Masked autoencoder · Spiking Neural Networks
