Reconciling In-Context and In-Weight Learning via Dual Representation Space Encoding
Guanyu Chen, Ruichen Wang, Tianren Zhang, Feng Chen

TL;DR
This paper introduces CoQE, a dual space encoding architecture that improves in-context learning and aligns it with in-weight learning by separating context and sample representations, validated through theoretical and empirical results.
Contribution
It proposes a novel dual representation space framework and architecture, CoQE, to reconcile ICL and IWL in Transformers, enhancing learning capabilities.
Findings
CoQE improves ICL performance in synthetic tasks.
The dual space model successfully reconciles ICL and IWL.
Theoretical analysis supports the effectiveness of the architecture.
Abstract
In-context learning (ICL) is a valuable capability exhibited by Transformers pretrained on diverse sequence tasks. However, previous studies have observed that ICL often conflicts with the model's inherent in-weight learning (IWL) ability. By examining the representation space learned by a toy model in synthetic experiments, we identify the shared encoding space for context and samples in Transformers as a potential source of this conflict. To address this, we modify the model architecture to separately encode the context and samples into two distinct spaces: a task representation space and a sample representation space. We model these two spaces under a simple yet principled framework, assuming a linear representational structure and treating them as a pair of dual spaces. Both theoretical analysis and empirical results demonstrate the effectiveness of our proposed architecture, CoQE,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
