Context is Environment

Sharut Gupta; Stefanie Jegelka; David Lopez-Paz; Kartik Ahuja

arXiv:2309.09888·cs.LG·September 21, 2023

Context is Environment

Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper proposes that viewing context as environment and leveraging in-context learning can significantly improve out-of-distribution generalization in AI models, supported by theory and experiments.

Contribution

It introduces In-Context Risk Minimization (ICRM), a novel algorithm that uses in-context learning to better adapt to test environments and improve domain generalization.

Findings

01

ICRM outperforms baseline methods in out-of-distribution tests

02

Paying attention to unlabeled context improves environment adaptation

03

Theoretical analysis supports the effectiveness of in-context learning for generalization

Abstract

Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the bitter lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LLMs) have erupted as algorithms able to learn in-context, generalizing on-the-fly to eclectic contextual circumstances that users enforce by means of prompting. In this paper, we argue that context is environment, and posit that in-context learning holds the key to better domain generalization. Via extensive theory and experiments, we show that paying attention to context $\unicode x 2013 \unicode x 2013$ unlabeled examples as they arrive $\unicode x 2013 \unicode x 2013$ allows our proposed…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

This paper is extremely timely and would be of great interest to researchers on domain generalization and group robustness. I strongly recommend that this paper is accepted. * The authors clearly motivate, present from first principles, and connect modern research in domain generalization and in-context learning. Their notation and exposition in Sections 2 and 3 is thoughtful and clear. Abstract concepts are made clear using examples, such as the self-driving car in Section 2 and sentence exa

Weaknesses

I am happy to consider raising my score if the authors address the below concerns. 1. **Choice of datasets for evaluation + including experiments on "harder" datasets with spurious correlations that vary across environments**. * My basic sense of the datasets that the authors chose to benchmark the value of their domain generalization method on is that there is arguably not *that* high variance in what semantic features are present across environments (e.g., simple rotations or corruptions ar

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

The writing in the article flows smoothly, and the experiments seem to yield very promising results.

Weaknesses

The combination of domain generalization and in-context learning is an ambitious idea. However, the theoretical and experimental discussions in this paper are not sufficient. I find that the motivation for integrating these two concepts does not fully convince me. It appears that this paper only utilizes information from observed data samples x, which I believe is not entirely consistent with the current concept of in-context learning in LLM because a sequence formed solely from observed train

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- I think this is an ambitious paper and connects two interesting areas within OOD-type research - there are definitely interesting insights gained from the new setup and the idea of receiving a sequence of new-domain examples - empirical results are strong, demonstrate a good use of unlabelled data - experiments are pretty thoroughly done, I appreciate the supplementary studies in Table 3, Fig 2, and Fig 5 - the new method is described clearly and the theoretical results seem useful: in particu

Weaknesses

- I find myself somewhat confused by the analogy between LLMs and ICRM - the paper makes it seem as those these should map 1:1 but I can't quite make it clear to myself, perhaps the authors can clarify. It's not clear what the sequence of Xs that arrive correspond to in LLMs, since they are listed as being selected at random at training time: this means they can't be language tokens, and if they are unrelated sequences I don't see how they correspond with the notion of context laid out in the "g

Videos

Context is Environment· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications