TRUST: Leveraging Text Robustness for Unsupervised Domain Adaptation
Mattia Litrico, Mario Valerio Giuffrida, Sebastiano Battiato, Devis Tuia

TL;DR
TRUST introduces a novel unsupervised domain adaptation method that leverages language modality robustness, using caption-based pseudo-labels and multimodal contrastive learning to improve vision model adaptation across complex domain shifts.
Contribution
The paper proposes TRUST, a new approach that exploits language modality robustness, uses uncertainty-aware pseudo-labeling, and employs multimodal contrastive learning for improved domain adaptation.
Findings
Outperforms previous methods on DomainNet and GeoNet datasets.
Sets new state-of-the-art in complex domain shifts.
Effectively mitigates pseudo-label errors using uncertainty estimation.
Abstract
Recent unsupervised domain adaptation (UDA) methods have shown great success in addressing classical domain shifts (e.g., synthetic-to-real), but they still suffer under complex shifts (e.g. geographical shift), where both the background and object appearances differ significantly across domains. Prior works showed that the language modality can help in the adaptation process, exhibiting more robustness to such complex shifts. In this paper, we introduce TRUST, a novel UDA approach that exploits the robustness of the language modality to guide the adaptation of a vision model. TRUST generates pseudo-labels for target samples from their captions and introduces a novel uncertainty estimation strategy that uses normalised CLIP similarity scores to estimate the uncertainty of the generated pseudo-labels. Such estimated uncertainty is then used to reweight the classification loss, mitigating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
