DISPEL: Domain Generalization via Domain-Specific Liberating
Chia-Yuan Chang, Yu-Neng Chuang, Guanchu Wang, Mengnan Du, Na Zou

TL;DR
DISPEL introduces a flexible post-processing masking method to improve domain generalization by filtering out domain-specific features, leading to better performance on unseen domains across multiple benchmarks.
Contribution
The paper proposes DISPEL, a novel masking approach that effectively filters domain-specific features, enhancing generalization without requiring domain labels or prediction-irrelevant noise removal.
Findings
DISPEL outperforms existing domain generalization methods on five benchmarks.
The method can be applied to various fine-tuned models.
A generalization error bound guarantees performance improvements.
Abstract
Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categorizing underlying feature groups into domain-shared and domain-specific features. Nevertheless, the domain-specific features are difficult to be identified and distinguished from the input data. In this work, we propose DomaIn-SPEcific Liberating (DISPEL), a post-processing fine-grained masking approach that can filter out undefined and indistinguishable domain-specific features in the embedding space. Specifically, DISPEL utilizes a mask generator that produces a unique mask for each input data…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
$\textbf{Performance and practicality}$: DISPEL consistently produces superior domain generalization performance over existing methods on 5 benchmarks, with a relatively simple implementation, i.e., applying trainable masking in the embedding space. Furthermore, it does not require domain label information and can be used additively with any deep learning based finetuning method as demonstrated in Table 4. $\textbf{Novelty}$: To my knowledge, this is the first attempt in employing dynamic, inst
$\textbf{Performance Stability}$: As stated in Sec 4.3.1, DISPEL possesses stable generalizing efficacy (Observation 2). If the “stability” here means that there is no significant performance degradation from across a wide range of benchmarks, I’m not sure if the empirical evaluation is convincing enough to support the claim. I would argue that, for examples, Mixup and MIRO are “stable” enough. Furthermore, the early premise about the needs of DISPEL as a more stable solution over the global m
The method proposes a kind of sample-level domain-specific feature identification which is supposed to be more flexible than existing global-level method considering the complex domain information in real application.
There are many publications also try to solve DG from the view of separating the representation into domain specific and domain invariant parts, which is not mentioned in this work. The analysis about the mask learning is not sufficient to support their idea.
- The proposed method is simple but effective. The proposed method achieves state-of-the-art performance in some cases, even without leveraging domain labels and any data augmentation method. - The experiments are nicely designed. Error bars are given. The experimental results are mostly significant. - Source code is available, which is critically important in domain generalization, where reproducibility is a crucial problem. - The paper is easy to follow.
- (Critical) The underlying mechanism, i.e., why DISPEL works well in domain generalization, is not completely clear (see Questions). I would like to know more about why DISPEL works well in unseen target domains. - The main idea of decomposing features into domain-invariant and domain-specific features has been explored and is not novel.
The authors propose a simple method in order to improve the performance of the pre-trained model on unseen test domains. The method demonstrates stable improvement by few percent across almost all benchmarks. The method doesn't require labels from source domains.
The main weakness of the paper is the theoretical justification of the method. Unfortunately, Theorem 1 has nothing to do with the generalization error on unseen domains and the original definition of the generalization error in the first place. Proper definitions could be for example found in the works of Ben-David (e.g. "A theory of learning from different domains").
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Cancer-related molecular mechanisms research
