Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning

H M Dipu Kabir; Subrota Kumar Mondal; Mohammad Ali Moni

arXiv:2505.06592·cs.CV·May 13, 2025

Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning

H M Dipu Kabir, Subrota Kumar Mondal, Mohammad Ali Moni

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel batch augmentation and unimodal fine-tuning approach for multimodal learning in medical ultrasound imaging, enhancing fetus organ detection by combining image and textual data with pre-training on medical data.

Contribution

It proposes a new training framework that integrates batch augmentation, unimodal fine-tuning, and multimodal data fusion, achieving improved performance on ultrasound datasets.

Findings

01

Achieved near state-of-the-art results on UPMC Food-101 dataset.

02

Demonstrated improved fetus organ detection accuracy.

03

Provided open-source scripts for the proposed method.

Abstract

This paper proposes batch augmentation with unimodal fine-tuning to detect the fetus's organs from ultrasound images and associated clinical textual information. We also prescribe pre-training initial layers with investigated medical data before the multimodal training. At first, we apply a transferred initialization with the unimodal image portion of the dataset with batch augmentation. This step adjusts the initial layer weights for medical data. Then, we apply neural networks (NNs) with fine-tuned initial layers to images in batches with batch augmentation to obtain features. We also extract information from descriptions of images. We combine this information with features obtained from images to train the head layer. We write a dataloader script to load the multimodal data and use existing unimodal image augmentation techniques with batch augmentation for the multimodal data. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dipuk0506/multimodal
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFetal and Pediatric Neurological Disorders · Domain Adaptation and Few-Shot Learning · Face recognition and analysis