Soft Head Selection for Injecting ICL-Derived Task Embeddings
Jungwon Park, Jimyeong Kim, Changin Choi, Wonjong Rhee

TL;DR
This paper introduces SITE, a gradient-based method that selects relevant attention heads to improve task embedding injection in large language models, outperforming prior methods across various tasks.
Contribution
The paper presents a novel soft head-selection technique for ICL-derived task embeddings that enhances performance and efficiency in large language models.
Findings
SITE significantly outperforms prior embedding-based methods and few-shot ICL.
It uses fewer trainable parameters than PEFT.
The approach is effective across 12 LLMs from 4B to 70B parameters.
Abstract
Large language models (LLMs) are commonly adapted to downstream tasks using parameter-efficient fine-tuning (PEFT) or in-context learning (ICL). Recently, ICL-driven embedding-based adaptation has been proposed as a distinct task adaptation paradigm. It derives task-specific embeddings from intermediate activations using few-shot prompts and injects them during inference. Despite its conceptual appeal, this approach has not demonstrated consistent performance gains over PEFT or ICL, and its empirical advantages have been limited in practice. We propose Soft head-selection for ICL-derived Task Embeddings (SITE), a gradient-based method that identifies task-relevant attention heads to enable effective task embedding injection. Across various types of open-ended generation, reasoning, and natural language understanding tasks, SITE significantly outperforms prior embedding-based adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
