INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla, Shivam Chandhok, Milan Aggarwal, Vineeth N, Balasubramanian, Balaji Krishnamurthy

TL;DR
INDIGO leverages intrinsic multimodal features from pre-trained networks to improve domain generalization, achieving state-of-the-art results across various settings without extensive textual annotations.
Contribution
The paper introduces INDIGO, a novel approach that utilizes intrinsic multimodal information from pre-trained models to enhance domain generalization in vision tasks.
Findings
INDIGO outperforms existing methods on multiple domain generalization benchmarks.
It achieves state-of-the-art results in ClosedDG, OpenDG, and Limited sources settings.
The approach reduces the need for costly textual annotations in multimodal training.
Abstract
For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on semantic understanding by capturing object characteristics that generalize under different domains. However, when multiple source domains are involved, the cost of curating textual annotations for every image in the dataset can blow up several times, depending on their number. This makes the process tedious and infeasible, hindering us from directly using these supervised vision-language approaches to achieve the best generalization on an unseen domain. Motivated from this, we study how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
