Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning
H M Dipu Kabir, Subrota Kumar Mondal, Mohammad Ali Moni

TL;DR
This paper introduces a novel batch augmentation and unimodal fine-tuning approach for multimodal learning in medical ultrasound imaging, enhancing fetus organ detection by combining image and textual data with pre-training on medical data.
Contribution
It proposes a new training framework that integrates batch augmentation, unimodal fine-tuning, and multimodal data fusion, achieving improved performance on ultrasound datasets.
Findings
Achieved near state-of-the-art results on UPMC Food-101 dataset.
Demonstrated improved fetus organ detection accuracy.
Provided open-source scripts for the proposed method.
Abstract
This paper proposes batch augmentation with unimodal fine-tuning to detect the fetus's organs from ultrasound images and associated clinical textual information. We also prescribe pre-training initial layers with investigated medical data before the multimodal training. At first, we apply a transferred initialization with the unimodal image portion of the dataset with batch augmentation. This step adjusts the initial layer weights for medical data. Then, we apply neural networks (NNs) with fine-tuned initial layers to images in batches with batch augmentation to obtain features. We also extract information from descriptions of images. We combine this information with features obtained from images to train the head layer. We write a dataloader script to load the multimodal data and use existing unimodal image augmentation techniques with batch augmentation for the multimodal data. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFetal and Pediatric Neurological Disorders · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
