On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications
Simon Baur, Alexandra Benova, Emilio Dolgener Cant\'u, Jackie Ma

TL;DR
This paper introduces a training method called multimodal privileged knowledge distillation (MMPKD) that enhances vision transformer models in medical imaging by leveraging additional data modalities during training, improving localization capabilities.
Contribution
The paper presents MMPKD, a novel training strategy that uses auxiliary modalities during training to improve unimodal vision transformer performance in medical diagnosis tasks.
Findings
MMPKD improves attention map localization in medical images.
The method enhances zero-shot ROI localization capabilities.
Cross-domain generalization remains limited.
Abstract
Deploying deep learning models in clinical practice often requires leveraging multiple data modalities, such as images, text, and structured data, to achieve robust and trustworthy decisions. However, not all modalities are always available at inference time. In this work, we propose multimodal privileged knowledge distillation (MMPKD), a training strategy that utilizes additional modalities available solely during training to guide a unimodal vision model. Specifically, we used a text-based teacher model for chest radiographs (MIMIC-CXR) and a tabular metadata-based teacher model for mammography (CBIS-DDSM) to distill knowledge into a vision transformer student model. We show that MMPKD can improve the resulting attention maps' zero-shot capabilities of localizing ROI in input images, while this effect does not generalize across domains, as contrarily suggested by prior research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · AI in cancer detection · Multimodal Machine Learning Applications
