AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse

Jie Ou; Jinyu Guo; Shiyao Guo; Yuang Li; Ruiqi Wu; Zhaokun Wang; Wenyi Li; Wenhong Tian

arXiv:2605.03644·cs.AI·May 6, 2026

AdapShot: Adaptive Many-Shot In-Context Learning with Semantic-Aware KV Cache Reuse

Jie Ou, Jinyu Guo, Shiyao Guo, Yuang Li, Ruiqi Wu, Zhaokun Wang, Wenyi Li, Wenhong Tian

PDF

TL;DR

AdapShot introduces a dynamic, entropy-based method for optimizing shot counts and employs semantic-aware KV cache reuse to improve efficiency and performance in Many-Shot In-Context Learning with LLMs.

Contribution

It presents a novel adaptive approach that dynamically determines the optimal number of shots and reuses KV caches efficiently, addressing static shot limitations and computational costs.

Findings

01

Achieves around 10% performance improvement over state-of-the-art methods.

02

Provides a 4.64x speedup in inference time.

03

Demonstrates effectiveness across extensive experiments.

Abstract

Many-Shot In-Context Learning (ICL) has emerged as a promising paradigm, leveraging extensive examples to unlock the reasoning potential of Large Language Models (LLMs). However, existing methods typically rely on a predetermined, fixed number of shots. This static approach often fails to adapt to the varying difficulty of different queries, leading to either insufficient context or interference from noise. Furthermore, the prohibitive computational and memory costs of long contexts severely limit Many-Shot's feasibility. To address the above limitations, we propose AdapShot, which dynamically optimizes shot counts and leverages KV cache reuse for efficient inference. Specifically, we design a probe-based evaluation mechanism that utilizes output entropy to determine the optimal number of shots. To bypass the redundant prefilling computation during both the probing and inference phases,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.