PEER pressure: Model-to-Model Regularization for Single Source Domain Generalization
Dong Kyu Cho, Inwoo Hwang, Sanghack Lee

TL;DR
PEER introduces a model-to-model regularization technique that stabilizes training and improves single source domain generalization by using a proxy model to accumulate knowledge and reduce feature distortion.
Contribution
The paper proposes PEER, a novel regularization method employing a proxy model and mutual information maximization to enhance domain generalization and reduce performance fluctuation.
Findings
PEER reduces out-of-distribution performance fluctuation.
PEER achieves state-of-the-art results with simple augmentation.
PEER outperforms prior methods on multiple datasets.
Abstract
Data augmentation is a popular tool for single source domain generalization, which expands the source domain by generating simulated ones, improving generalization on unseen target domains. In this work, we show that the performance of such augmentation-based methods in the target domains universally fluctuates during training, posing challenges in model selection under realistic scenarios. We argue that the fluctuation stems from the inability of the model to accumulate the knowledge learned from diverse augmentations, exacerbating feature distortion during training. Based on this observation, we propose a novel generalization method, coined Parameter-Space Ensemble with Entropy Regularization (PEER), that uses a proxy model to learn the augmented data on behalf of the main model. The main model is updated by averaging its parameters with the proxy model, progressively accumulating…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The overall structure is logical, with a clear flow from the identification of the problem to the proposed solution and subsequent empirical validation. The experimental setup and baselines are well described.
1. The core idea of parameter averaging is not new, and the contribution may not be seen as significantly advancing beyond existing ensemble and regularization techniques. 2. The authors argue that data augmentation leads to feature distortion. However, it is unclear what feature distortion means, and its existence is not substantiated by theoretical or experimental analysis. 3. The authors also argue that better generalization correlates with larger fluctuation during training. However, the emp
The studied problem is practical and interesting. The overall findings are reasonable, and the proposed method appears technically sound. Compared with RandAug, the method achieves significant improvements.
The paper still lacks key discussions and comparisons to clarify its originality and contributions. Please refer to my questions for detailed comments.
* The paper is well-written and well-motivated. * They investigate different components of their method thoroughly and in detail with different ablations. * Their method is simple but effective in most cases.
* The novelty of their method is limited and incremental. They combined the Barlow twins [1] loss and teacher-student architecture methods (i.e., DINO [2]) in a supervised setting. * The connection between the variance of target domain accuracy and generalizability is missing. The primary motivation of the paper is to decrease the augmentation distortion to improve the model generalization (Table 1, figure 3, and Table 5), which they measure with OOD fluctuation. However, in Table 5, a pretrai
The authors show that the dissimilar the target domain is to the source domain, the larger the fluctuation. Also, the fluctuation becomes smaller by augmenting data to reduce the disimilarity.
- The novelty of the parameter-averaging methodology is limited. Is the proposed methodology specialized for domain generalization? How could it differentiate it from other ensemble methods? - The author argues that the parameter-averaged task mode guides the proxy model's learning process, reducing the OOD fluctuation. However, the motivation behind the argument is not well explained. Is there any theoretical guarantee about this argument? - For the OOD fluctuation, the authors investigate two
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis
MethodsEntropy Regularization
