An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and   Semantic Information for Automatic OSAHS Diagnosis

Yingchen Wei; Xihe Qiu; Xiaoyu Tan; Jingjing Huang; Wei Chu; Yinghui; Xu; Yuan Qi

arXiv:2412.18919·cs.CV·December 30, 2024

An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis

Yingchen Wei, Xihe Qiu, Xiaoyu Tan, Jingjing Huang, Wei Chu, Yinghui, Xu, Yuan Qi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multimodal dual-encoder deep learning framework that combines visual facial features and semantic data for accurate, efficient, and non-invasive diagnosis of obstructive sleep apnea-hypopnea syndrome, outperforming existing methods.

Contribution

The paper presents a novel multimodal dual-encoder model integrating visual and semantic information with attention mechanisms for OSAHS diagnosis, achieving state-of-the-art accuracy.

Findings

01

Achieved 91.3% top-1 accuracy in four-class severity classification.

02

Improved diagnostic accuracy over existing facial image analysis methods.

03

Demonstrated effectiveness of cross-attention and ordered regression loss in model stability.

Abstract

Obstructive sleep apnea-hypopnea syndrome (OSAHS) is a common sleep disorder caused by upper airway blockage, leading to oxygen deprivation and disrupted sleep. Traditional diagnosis using polysomnography (PSG) is expensive, time-consuming, and uncomfortable. Existing deep learning methods using facial image analysis lack accuracy due to poor facial feature capture and limited sample sizes. To address this, we propose a multimodal dual encoder model that integrates visual and language inputs for automated OSAHS diagnosis. The model balances data using randomOverSampler, extracts key facial features with attention grids, and converts physiological data into meaningful text. Cross-attention combines image and text data for better feature extraction, and ordered regression loss ensures stable learning. Our approach improves diagnostic efficiency and accuracy, achieving 91.3% top-1 accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luboyan6/VTA-OSAHS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCardiovascular and Diving-Related Complications

MethodsSoftmax · Attention Is All You Need