SHED: Style-Homogenized Embedding Alignment for Domain Generalization

Kai Gan; Tong Wei

arXiv:2605.16973·cs.CV·May 19, 2026

SHED: Style-Homogenized Embedding Alignment for Domain Generalization

Kai Gan, Tong Wei

PDF

TL;DR

SHED introduces a style-homogenized embedding alignment method for CLIP to improve domain generalization by removing domain-specific styles from embeddings during training and inference.

Contribution

It proposes a novel style-homogenized embedding alignment technique that enhances CLIP's robustness to unseen domains in domain generalization tasks.

Findings

01

SHED achieves state-of-the-art results on five benchmarks.

02

Outperforms prior methods significantly, e.g., +4.0% on DomainNet.

03

Effectively removes domain-specific styles from embeddings.

Abstract

Domain generalization aims to enhance model robustness against unseen domains with embedding distribution shifts. While large-scale vision-language models like CLIP exhibit strong generalization, their direct image-text embedding alignment suffers from inherent information asymmetry: images encode both class semantics and domain-specific styles, whereas text prompts primarily convey basic class cues. This asymmetry hinders generalization to novel domains in realistic scenarios. To address this, we propose Style-Homogenized Embedding alignment for Domain-generalization (SHED), a novel CLIP-based method that aligns style-homogenized embeddings instead of raw representations from encoders in CLIP. During training, SHED removes domain-specific style centroids from both image embeddings computed per source domains and text embeddings which are averaged across diverse prompt templates and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.