Iterative Forward Tuning Boosts In-Context Learning in Language Models
Jiaxi Yang, Binyuan Hui, Min Yang, Bailin Wang, Bowen Li, Binhua Li,, Fei Huang, Yongbin Li

TL;DR
This paper introduces a two-stage iterative framework called Deep-Thinking that enhances in-context learning in large language models by allowing multiple rounds of information processing without training, leading to improved performance.
Contribution
The study proposes a novel two-stage framework with an iterative attention mechanism that significantly boosts in-context learning performance in LLMs.
Findings
Outperforms vanilla ICL methods across benchmarks
Effective in tasks with difficult demonstration selection
Enhances understanding without additional training
Abstract
Despite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample. However, this perspective overlooks the potential benefits derived from multiple iterations involving demonstrations, a practice aligning more closely with the iterative decision-making process exhibited by humans, who often learn through analogy. In this study, we introduce a novel two-stage framework to boost ICL in LLMs. Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Test · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Position-Wise Feed-Forward Layer
