Out-Of-Domain Unlabeled Data Improves Generalization
Amir Hossein Saberi, Amir Najafi, Alireza Heidari, Mohammad Hosein, Movasaghinia, Abolfazl Motahari, Babak H. Khalaj

TL;DR
This paper introduces a framework combining Distributionally Robust Optimization with self-supervised learning to leverage unlabeled out-of-domain data, improving generalization in semi-supervised classification, especially under distributional shifts.
Contribution
The paper presents a novel DRO-based semi-supervised framework that effectively utilizes unlabeled out-of-domain data to enhance classification generalization bounds.
Findings
Out-of-domain unlabeled data can significantly reduce generalization error.
The proposed method outperforms ERM on Gaussian mixture models.
Experimental validation on synthetic and real datasets confirms theoretical improvements.
Abstract
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems, where scenarios involving the minimization of either i) adversarially robust or ii) non-robust loss functions have been considered. Notably, we allow the unlabeled samples to deviate slightly (in total variation sense) from the in-domain distribution. The core idea behind our framework is to combine Distributionally Robust Optimization (DRO) with self-supervised training. As a result, we also leverage efficient polynomial-time algorithms for the training stage. From a theoretical standpoint, we apply our framework on the classification problem of a mixture of two Gaussians in , where in addition to the independent and labeled samples from the true distribution, a set of (usually with ) out of domain and unlabeled samples are given as well. Using only…
Peer Reviews
Decision·ICLR 2024 spotlight
- The paper analyzes the generalization error of the newly introduced algorithm, which utilizes self-supervised learning and adversarially robust optimization, demonstrating an improvement in error compared to traditional ERM. - It provides experimental results corroborating their theoretical findings that unlabeled samples from a perturbed distribution can reduce the test error.
- The paper's linear Gaussian mixture model is very restrictive. - The manuscript dedicates a substantial portion to discussing established definitions and findings in the literature. In contrast, the final three pages primarily center on discussing the paper's contributions.
While multiple works have proposed algorithms that show the advantage of unlabeled data for obtaining robust classifiers with high accuracy, this paper's contribution lies in providing an algorithm for the distributionally robust framework that has theoretical guarantees for the linear classification for the Gaussian mixture model case. The latter has also been studied in other robustness frameworks thus highlighting the significance of studying it in a different robustness setting. The algorith
A few correctable weaknesses follow that could help strengthen the paper: 1) The comparison with related works isn't thorough in the sense that the paper mentions these related works but doesn't provide any comparison of their current work with it. For e.g. the paper doesn't mention that the works of Carlini et al, Carmon et. al. etc. were for the adversarial robustness setting. It would make the contributions stand out more clearly if there is precise comparison as to how earlier work is diffe
The method pushes the classifier to avoid crowded areas, which is similar in spirit to large margin methods. Making use of unlabeled data seems to improve its ability to do this. The analysis provides a novel non-asymptotic learning bound for Gaussian Mixtures. The method also has well motivated controls for dialing in the bias-variance trade-off.
How does this method compare against large-margin based methods? What if one treats the "slightly out-of-distribution" data as in distribution? Some comparison with similar approaches is warranted. "given" is misspelled in the abstract. While I did not work through all the details, the "new set" of bounds appear to be based heavily on an upper bound of the Rademacher complexity. I feel this really ought to be stated in the abstract and main body of the paper. I don't think that comparing Rade
Videos
Taxonomy
TopicsSpectroscopy Techniques in Biomedical and Chemical Research · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
