Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu,, Rui Yan

TL;DR
This paper introduces MoICE, a novel method that enhances large language models' long context awareness by dynamically routing attention to specific positional information, improving performance without significant efficiency loss.
Contribution
MoICE is a new approach that uses position-specific experts and lightweight training to improve LLMs' long context understanding and generation capabilities.
Findings
MoICE outperforms previous methods on long context tasks.
It maintains high inference efficiency.
Effective for models like Llama and Mistral.
Abstract
Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called "Mixture of In-Context Experts" (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an `in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Semantic Web and Ontologies · Recommender Systems and Techniques
MethodsSoftmax · Attention Is All You Need · LLaMA
