ERM++: An Improved Baseline for Domain Generalization

Piotr Teterwak; Kuniaki Saito; Theodoros Tsiligkaridis; Kate Saenko,; Bryan A. Plummer

arXiv:2304.01973·cs.LG·December 11, 2024·1 cites

ERM++: An Improved Baseline for Domain Generalization

Piotr Teterwak, Kuniaki Saito, Theodoros Tsiligkaridis, Kate Saenko,, Bryan A. Plummer

PDF

Open Access 2 Repos 4 Reviews

TL;DR

ERM++ enhances the standard empirical risk minimization baseline for domain generalization by tuning overlooked training factors, significantly boosting performance across datasets with minimal added complexity.

Contribution

This paper introduces ERM++, a simple yet effective improvement over ERM that significantly enhances domain generalization performance through careful hyperparameter tuning.

Findings

01

ERM++ outperforms prior ERM baselines by over 5% on standard benchmarks.

02

ERM++ surpasses all state-of-the-art methods on DomainBed datasets.

03

ERM++ is easy to implement and integrate into existing frameworks.

Abstract

Domain Generalization (DG) aims to develop classifiers that can generalize to new, unseen data distributions, a critical capability when collecting new domain-specific data is impractical. A common DG baseline minimizes the empirical risk on the source domains. Recent studies have shown that this approach, known as Empirical Risk Minimization (ERM), can outperform most more complex DG methods when properly tuned. However, these studies have primarily focused on a narrow set of hyperparameters, neglecting other factors that can enhance robustness and prevent overfitting and catastrophic forgetting, properties which are critical for strong DG performance. In our investigation of training data utilization (i.e., duration and setting validation splits), initialization, and additional regularizers, we find that tuning these previously overlooked factors significantly improves model…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 5

Strengths

The obvious strength of the paper is the exciting results. The experimental settings are carefully described.

Weaknesses

Despite presenting exciting results with elaborated experiments, the paper lacks technical insight into the effectiveness of various components, especially when they are used together. While I do see the merit of the engineering approach and agree that the field should appropriately acknowledge this as a baseline for large DomainBed, I do not think the current contribution of ERM++ is fit for a venue like ICLR. Thus, I cannot recommend acceptance for the paper.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The strategies used to maximize ERM performance for domain generalization are practically useful, and the resultant model is a competitive baseline for future work in the area. 2. Extensive experiments were performed with many different methods, architectures, and datasets. The ablation studies are also well done. 3. Careful analysis of the results were performed and edge cases were highlighted in the text. In particular, many of my initial questions were answered upon a closer read of the an

Weaknesses

1. While the premise of the contribution - that ERM can match SOTA DG algorithms when appropriate data utilization, initialization, and regularization are applied - is important, the strong performance of ERM has been known since [1] and is not exactly novel. The main contribution of this paper is in applying recent “tricks of the trade” to further boost ERM numbers. While this may be helpful for practitioners, no new insight is offered as to why the proposed techniques are useful for ERM specif

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The experiments are dense and comprehensive. The proposed baseline is evaluated on most existing domain generalisation benchmarks comprehensively. 2. ERM++ explore existing practical technique tricks in DG with a detailed description.

Weaknesses

1. The name is a little misleading as MPA is part of ERM++. If this is the case, it is natural to wonder what ERM + MPA performance will be like and by removing MPA from ERM++, then how ERM++ will perform. I think the closest setting is Table 4. Once MPA is added, for example comparing #1 and #2, the performance boosts significantly. The rest setting cumulates each tech one by one. But it is also important to know whether each one contributes independently. 2. One of the main points made in Do

Reviewer 04Rating 3· reject, not good enoughConfidence 4

Strengths

1. The paper investigates many techniques for the ERM model to improve multi-source domain generalization, which helps other researchers in this field to find suitable methods for their research. 2. The paper provides rich experiments to show the benefits of the utilized techniques for domain generalization.

Weaknesses

1. Although the paper includes many techniques to improve the ERM model on domain generalization, most of these technologies have been proposed or are widely known. The improvement with these techniques is not surprising and inspiring. 2. The paper argues to propose a general baseline for future domain generalization works with the existing techniques. However the experiments, such as Table 3 (a), Table 4, and Table 5 show that different techniques benefit different datasets or settings, which

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research