In-Context Learning State Vector with Inner and Momentum Optimization
Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang

TL;DR
This paper introduces inner and momentum optimization techniques for refining state vectors in in-context learning, leading to improved performance and state-of-the-art results on various tasks with large language models.
Contribution
It proposes novel inner and momentum optimization methods for state vectors in ICL and introduces a divide-and-conquer aggregation approach for lengthy demonstrations.
Findings
Optimization enhances state vector quality and task performance.
Achieves state-of-the-art results on multiple benchmarks.
Effective in both zero-shot and few-shot settings.
Abstract
Large Language Models (LLMs) have exhibited an impressive ability to perform In-Context Learning (ICL) from only a few examples. Recent works have indicated that the functions learned by ICL can be represented through compressed vectors derived from the transformer. However, the working mechanisms and optimization of these vectors are yet to be thoroughly explored. In this paper, we address this gap by presenting a comprehensive analysis of these compressed vectors, drawing parallels to the parameters trained with gradient descent, and introduce the concept of state vector. Inspired by the works on model soup and momentum-based gradient descent, we propose inner and momentum optimization methods that are applied to refine the state vector progressively as test-time adaptation. Moreover, we simulate state vector aggregation in the multiple example setting, where demonstrations comprising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFault Detection and Control Systems · EEG and Brain-Computer Interfaces
