Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild
Vihari Piratla

TL;DR
This paper addresses the challenge of deploying reliable machine learning models in real-world scenarios with distribution shifts by proposing algorithms for robustness, evaluation, and adaptation, including domain generalization and label-efficient performance forecasting.
Contribution
It introduces new training algorithms to enhance domain robustness, methods for estimating accuracy under distribution shifts, and lightweight adaptation techniques using unlabeled data.
Findings
Improved robustness over standard training in certain settings
Proposed accuracy estimation method for distribution shifts
Explored lightweight adaptation with unlabeled data in language tasks
Abstract
Our goal is to improve reliability of Machine Learning (ML) systems deployed in the wild. ML models perform exceedingly well when test examples are similar to train examples. However, real-world applications are required to perform on any distribution of test examples. Current ML systems can fail silently on test examples with distribution shifts. In order to improve reliability of ML models due to covariate or domain shift, we propose algorithms that enable models to: (a) generalize to a larger family of test distributions, (b) evaluate accuracy under distribution shifts, (c) adapt to a target distribution. We study causes of impaired robustness to domain shifts and present algorithms for training domain robust models. A key source of model brittleness is due to domain overfitting, which our new training algorithms suppress and instead encourage domain-general hypotheses. While we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Software Reliability and Analysis Research · Software Engineering Research
Methodsfail · Test
