Unraveling Indirect In-Context Learning Using Influence Functions

Hadi Askari; Shivanshu Gupta; Terry Tong; Fei Wang; Anshuman Chhabra; Muhao Chen

arXiv:2501.01473·cs.LG·October 3, 2025

Unraveling Indirect In-Context Learning Using Influence Functions

Hadi Askari, Shivanshu Gupta, Terry Tong, Fei Wang, Anshuman Chhabra, Muhao Chen

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Indirect In-Context Learning, utilizing Influence Functions for improved demonstration selection in diverse scenarios, leading to better accuracy and robustness against noise and adversarial attacks.

Contribution

It proposes a novel Indirect ICL framework that leverages Influence Functions for demonstration selection, enhancing performance in noisy and multi-task settings.

Findings

01

Influence Functions improve demonstration selection accuracy.

02

Combining BSR with IFs yields accuracy gains in 3-shot and 5-shot setups.

03

IF-based selectors significantly reduce attack success rates in adversarial scenarios.

Abstract

In this work, we introduce a novel paradigm for generalized In-Context Learning (ICL), termed Indirect In-Context Learning. In Indirect ICL, we explore demonstration selection strategies tailored for two distinct real-world scenarios: Mixture of Tasks and Noisy ICL. We systematically evaluate the effectiveness of Influence Functions (IFs) as a selection tool for these settings, highlighting the potential of IFs to better capture the informativeness of examples within the demonstration pool. For the Mixture of Tasks setting, demonstrations are drawn from 28 diverse tasks, including MMLU, BigBench, StrategyQA, and CommonsenseQA. We demonstrate that combining BertScore-Recall (BSR) with an IF surrogate model can further improve performance, leading to average absolute accuracy gains of 0.37\% and 1.45\% for 3-shot and 5-shot setups when compared to traditional ICL metrics. In the Noisy ICL…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

- Indirect ICL is (I think, since it's not formally introduced in the paper; I'm considering it as a mixture of mixtures of tasks and noisy supervision) a task that reflects many real-world use cases of LLMs; this paper is timely and potentially very useful to many practitioners in the field - Using surrogate DataInf with BERTScore yields the best average accuracy, improving over standard ICL baselines on LLama2, Zephyr, and Mistral - IF-based averaging also improves accuracy on noisy MRPC/QQP/e

Weaknesses

- There are some sample selection baselines in recent literature (e.g. MoICL, from ACL'25: https://arxiv.org/abs/2411.02830) that have not been compared to the proposed approach - Are the reported gains statistically significant? At times they seem fairly small - Surrogate approach still requires fine-tuning on the candidate pool -- what's the computational footprint of the approach in that case? - Influence scores rely on labeled validation pairs; what's the process for obtaining those in the "

Reviewer 02Rating 6Confidence 3

Strengths

1. A new form of in-context learning is proposed to extend the existing problem to realistic scenarios with task mismatch and noisy labels. 2. The empirical coverage is extensive, and the measurements in this paper cover multiple LLMs and multi-task sets.

Weaknesses

1. The definition of adaptive scenarios is not clear. The paper proposes Indirect ICL as a new task setting, but it is not clear under what “adaptive scenarios” this setting is more meaningful. For example, MoT (Mixture of Tasks) and Noisy ICL are defined, but the paper lacks explanations of the motivation, boundaries of use, and typical applications of these contexts in real systems. 2. Lack of performance on closed-source models. One of the core values of in-context learning is “cross-model g

Reviewer 03Rating 4Confidence 4

Strengths

- Originality in Problem Formulation: The paper identifies and formalizes a relevant and under-explored problem—ICL when direct, clean task demonstrations are unavailable. The MoT and Noisy ICL settings are practical and important. - Thorough Empirical Evaluation: The authors conduct extensive experiments across multiple LLMs, datasets, and settings (MoT, noisy, adversarial), which adds credibility to their findings. - Clarity and Scope: The paper is clearly written, and the exploration of bot

Weaknesses

- Limited Empirical Improvement: The performance gains are consistently small, often within a few percentage points, and their statistical significance is not established. This undermines the claim that IFs provide a substantial benefit. - Incremental Technical Contribution: The methodology is an application of existing tools rather than a novel algorithmic or theoretical contribution. The two-stage selection process is a simple ensemble of existing techniques. - Scalability and Cost Concerns:

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Human Pose and Action Recognition