A scoping review on multimodal deep learning in biomedical images and texts
Zhaoyi Sun, Mingquan Lin, Qingqing Zhu, Qianqian Xie, Fei Wang,, Zhiyong Lu, Yifan Peng

TL;DR
This scoping review summarizes the current state, applications, and research gaps of multimodal deep learning combining biomedical images and texts, aiming to advance diagnostic and interpretative systems.
Contribution
It provides a comprehensive overview of multimodal deep learning in biomedical data, identifying key concepts, study types, and future research directions.
Findings
Multimodal deep learning is applied to report generation, visual question answering, and diagnosis.
Research gaps include limited integration techniques and evaluation standards.
Diverse applications highlight the potential of MDL in biomedical fields.
Abstract
Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. This study reviewed the current uses of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus · Minimum Description Length
