Towards Unified and Effective Domain Generalization
Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv,, Kurt Keutzer, Xiangyu Yue

TL;DR
UniDG is a unified, inference-time finetuning framework that improves out-of-distribution generalization of foundation models across various architectures by unsupervised learning and a penalty to prevent catastrophic forgetting.
Contribution
It introduces a novel inference-stage finetuning method with a penalty to enhance domain generalization without additional training.
Findings
Average accuracy improvement of +5.4% on DomainBed
Effective across 12 diverse visual backbones
Reduces catastrophic forgetting during finetuning
Abstract
We propose , a novel and fied framework for omain eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures. The core idea of UniDG is to finetune models during the inference stage, which saves the cost of iterative training. Specifically, we encourage models to learn the distribution of test data in an unsupervised manner and impose a penalty regarding the updating step of model parameters. The penalty term can effectively reduce the catastrophic forgetting issue as we would like to maximally preserve the valuable knowledge in the original model. Empirically, across 12 visual backbones, including CNN-, MLP-, and Transformer-based models, ranging from 1.89M to 303M parameters, UniDG shows an average accuracy improvement of +5.4%…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- The catastrophic forgetting issue during TTA for domain generalization is well motivated.
- The discussion about related work is not sufficient. In the section of related work, this paper simply listed many related works, but didnot discusses the relation between the proposed method and the mentioned related works. - This paper is more likely to be a Test-Time Domain Adaptation work. So I think Test-Time Domain-Adaptation is more suitable in this paper rather than Domain Generalization. - I dont believe it is the first time to discuss the catastrophic forgetting issue during TTA f
- Propose a tradeoff b/w freezing the encoder which would lead to underfitting and updating the decoder which would lead to catastrophic forgetting. - Consistently improved benchmarks.
- Theoretical insight why marginal generalization is important for generalization in unseen domains is explained well in Appendix, but is very unclear from the text of the main paper. I think this important aspect should be better discussed in the main text. - Also motivation for Differentiable Memory Bank should be more clearly written.
In other words, marginal generalization is proposed to update the encoder of Test-Time Adaptation (TTA) and differentiable memory bank is proposed to refine features for DG. Experiments on five datasets such as VLCS, PACS, OfficeHome and so on demonstrate the superiority compared with SOTA methods across 12 different network architectures.
The structure of paper is unfitable for most ML reader’s habits, especially, the part of related work should follow the introduction. There will be a better logical relationship for most ML conference paper.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
