A Foundational Multi-Modal Model for Few-Shot Learning

Pengtao Dang; Tingbo Guo; Sha Cao; Chi Zhang

arXiv:2508.04746·cs.LG·August 8, 2025

A Foundational Multi-Modal Model for Few-Shot Learning

Pengtao Dang, Tingbo Guo, Sha Cao, Chi Zhang

PDF

TL;DR

This paper introduces a large multi-modal model framework trained on diverse scientific data to significantly enhance few-shot learning capabilities in data-scarce scientific fields.

Contribution

The study presents a novel multi-modal model framework and a curated dataset for few-shot learning, outperforming traditional meta-learning methods in scientific applications.

Findings

01

Improved few-shot learning performance over meta-learning models.

02

Created a comprehensive multi-modal dataset with over 10,000 samples.

03

Developed a flexible, scalable framework for scientific data types.

Abstract

Few-shot learning (FSL) is a machine learning paradigm that aims to generalize models from a small number of labeled examples, typically fewer than 10 per class. FSL is particularly crucial in biomedical, environmental, materials, and mechanical sciences, where samples are limited and data collection is often prohibitively costly, time-consuming, or ethically constrained. In this study, we present an innovative approach to FSL by demonstrating that a Large Multi-Modal Model (LMMM), trained on a set of independent tasks spanning diverse domains, task types, and input modalities, can substantially improve the generalization of FSL models, outperforming models based on conventional meta-learning on tasks of the same type. To support this, we first constructed a Multi-Modal Model Few-shot Dataset (M3FD, over 10K+ few-shot samples), which includes 2D RGB images, 2D/3D medical scans, tabular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.