Probing In-Context Learning: Impact of Task Complexity and Model Architecture on Generalization and Efficiency

Binwen Liu; Peiyu Xu; Quan Yuan; Yihong Chen

arXiv:2505.06475·cs.LG·May 13, 2025

Probing In-Context Learning: Impact of Task Complexity and Model Architecture on Generalization and Efficiency

Binwen Liu, Peiyu Xu, Quan Yuan, Yihong Chen

PDF

Open Access 1 Repo

TL;DR

This paper systematically examines how task complexity and model architecture influence in-context learning, revealing architecture-specific strengths and the importance of curriculum learning for complex tasks.

Contribution

It introduces new tasks and evaluates multiple architectures, demonstrating how model design impacts ICL performance and generalization.

Findings

01

Transformer performs robustly across tasks

02

Mamba excels in temporal dynamics

03

Hyena captures long-range dependencies

Abstract

We investigate in-context learning (ICL) through a meticulous experimental framework that systematically varies task complexity and model architecture. Extending beyond the linear regression baseline, we introduce Gaussian kernel regression and nonlinear dynamical system tasks, which emphasize temporal and recursive reasoning. We evaluate four distinct models: a GPT2-style Transformer, a Transformer with FlashAttention mechanism, a convolutional Hyena-based model, and the Mamba state-space model. Each model is trained from scratch on synthetic datasets and assessed for generalization during testing. Our findings highlight that model architecture significantly shapes ICL performance. The standard Transformer demonstrates robust performance across diverse tasks, while Mamba excels in temporally structured dynamics. Hyena effectively captures long-range dependencies but shows higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Binwen6/CS182_PROJECT_2025
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Gaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Linear Regression · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Mamba: Linear-Time Sequence Modeling with Selective State Spaces