Toward Automated Regulatory Decision-Making: Trustworthy Medical Device Risk Classification with Multimodal Transformers and Self-Training
Yu Han, Aaron Ceross, and Jeroen H.M. Bergmann

TL;DR
This paper introduces a multimodal Transformer framework with self-training for accurate and trustworthy medical device risk classification, combining textual and visual data to outperform existing methods.
Contribution
It presents a novel multimodal Transformer model with cross-attention and self-training strategies for improved medical device risk classification under limited supervision.
Findings
Achieved up to 90.4% accuracy and 97.9% AUROC on real-world data.
Self-training improved accuracy by 3.3 percentage points over standard multimodal fusion.
Ablation studies confirmed the benefits of cross-modal attention and self-training.
Abstract
Accurate classification of medical device risk levels is essential for regulatory oversight and clinical safety. We present a Transformer-based multimodal framework that integrates textual descriptions and visual information to predict device regulatory classification. The model incorporates a cross-attention mechanism to capture intermodal dependencies and employs a self-training strategy for improved generalization under limited supervision. Experiments on a real-world regulatory dataset demonstrate that our approach achieves up to 90.4% accuracy and 97.9% AUROC, significantly outperforming text-only (77.2%) and image-only (54.8%) baselines. Compared to standard multimodal fusion, the self-training mechanism improved SVM performance by 3.3 percentage points in accuracy (from 87.1% to 90.4%) and 1.4 points in macro-F1, suggesting that pseudo-labeling can effectively enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Support Vector Machine
