Human Knowledge Integrated Multi-modal Learning for Single Source Domain Generalization

Ayan Banerjee; Kuntal Thakur; Sandeep Gupta

arXiv:2603.12369·cs.CV·March 16, 2026

Human Knowledge Integrated Multi-modal Learning for Single Source Domain Generalization

Ayan Banerjee, Kuntal Thakur, Sandeep Gupta

PDF

Open Access

TL;DR

This paper introduces a theoretical framework and a multimodal vision-language model approach to improve single-source domain generalization in medical image classification tasks, addressing unknown causal domain differences.

Contribution

It proposes domain conformal bounds for assessing domain divergence and a novel GenEval model combining foundational models with human knowledge to enhance generalization.

Findings

01

GenEval achieves 69.2% accuracy on DR datasets, outperforming baselines by 9.4%.

02

GenEval achieves 81% accuracy on SOZ datasets, outperforming baselines by 1.8%.

03

Theoretical framework helps evaluate domain divergence without metadata.

Abstract

Generalizing image classification across domains remains challenging in critical tasks such as fundus image-based diabetic retinopathy (DR) grading and resting-state fMRI seizure onset zone (SOZ) detection. When domains differ in unknown causal factors, achieving cross-domain generalization is difficult, and there is no established methodology to objectively assess such differences without direct metadata or protocol-level information from data collectors, which is typically inaccessible. We first introduce domain conformal bounds (DCB), a theoretical framework to evaluate whether domains diverge in unknown causal factors. Building on this, we propose GenEval, a multimodal Vision Language Models (VLM) approach that combines foundational models (e.g., MedGemma-4B) with human knowledge via Low-Rank Adaptation (LoRA) to bridge causal gaps and enhance single-source domain generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · EEG and Brain-Computer Interfaces