Grounding Visual Representations with Texts for Domain Generalization

Seonwoo Min; Nokyung Park; Siwon Kim; Seunghyun Park; Jinkyu Kim

arXiv:2207.10285·cs.CV·August 10, 2022

Grounding Visual Representations with Texts for Domain Generalization

Seonwoo Min, Nokyung Park, Siwon Kim, Seunghyun Park, Jinkyu Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel vision-and-language approach using natural language supervision to improve domain generalization in visual models, demonstrating state-of-the-art results on multiple benchmarks.

Contribution

It proposes two modules for grounding visual representations with texts and is the first to apply cross-modality supervision for domain generalization.

Findings

01

Improved domain-invariant visual representations.

02

Achieved state-of-the-art results on DomainBed benchmark.

03

Demonstrated effectiveness on CUB-DG dataset.

Abstract

Reducing the representational discrepancy between source and target domains is a key component to maximize the model generalization. In this work, we advocate for leveraging natural language supervision for the domain generalization task. We introduce two modules to ground visual representations with texts containing typical reasoning of humans: (1) Visual and Textual Joint Embedder and (2) Textual Explanation Generator. The former learns the image-text joint embedding space where we can ground high-level class-discriminative information into the model. The latter leverages an explainable model and generates explanations justifying the rationale behind its decision. To the best of our knowledge, this is the first work to leverage the vision-and-language cross-modality approach for the domain generalization task. Our experiments with a newly created CUB-DG benchmark dataset demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mswzeus/gvrt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling