Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling

Deeptanshu Malu; Deevyanshu Malu; Aditya Nemiwal; Sunita Sarawagi

arXiv:2604.01601·cs.LG·April 3, 2026

Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling

Deeptanshu Malu, Deevyanshu Malu, Aditya Nemiwal, Sunita Sarawagi

PDF

TL;DR

This paper explores training strategies for large language models to balance in-context learning and in-weights learning by using contrastive context sampling, enhancing model flexibility and stability.

Contribution

It introduces a simple contrastive context method that enforces similarity-based contrasts to promote stable ICL-IWL mixtures during training.

Findings

01

Contrastive context sampling improves ICL-IWL balance.

02

Models trained with contrastive context maintain stable in-context and in-weights learning.

03

Empirical results show enhanced performance across multiple tasks and models.

Abstract

We investigate training strategies that co-develop in-context learning (ICL) and in-weights learning (IWL), and the ability to switch between them based on context relevance. Although current LLMs exhibit both modes, standard task-specific fine-tuning often erodes ICL, motivating IC-Train - fine-tuning with in-context examples. Prior work has shown that emergence of ICL after IC-Train depends on factors such as task diversity and training duration. In this paper we show that the similarity structure between target inputs and context examples also plays an important role. Random context leads to loss of ICL and IWL dominance, while only similar examples in context causes ICL to degenerate to copying labels without regard to relevance. To address this, we propose a simple Contrastive-Context which enforces two types of contrasts: (1) mix of similar and random examples within a context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.