In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

Ayush Goel; Arjun Kohli; Sarvagya Somvanshi

arXiv:2602.17171·cs.LG·February 20, 2026

In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks

Ayush Goel, Arjun Kohli, Sarvagya Somvanshi

PDF

Open Access

TL;DR

This paper empirically compares in-context learning capabilities of linear and quadratic attention models on linear regression tasks, analyzing their performance, convergence, and effects of model depth.

Contribution

It provides a comparative empirical analysis of linear versus quadratic attention mechanisms in in-context learning for regression tasks, highlighting their similarities and limitations.

Findings

01

Linear and quadratic attention models show similar ICL performance on regression.

02

Increasing model depth impacts ICL performance differently across architectures.

03

Linear attention models have limitations compared to quadratic attention in this setting.

Abstract

Recent work has demonstrated that transformers and linear attention models can perform in-context learning (ICL) on simple function classes, such as linear regression. In this paper, we empirically study how these two attention mechanisms differ in their ICL behavior on the canonical linear-regression task of Garg et al. We evaluate learning quality (MSE), convergence, and generalization behavior of each architecture. We also analyze how increasing model depth affects ICL performance. Our results illustrate both the similarities and limitations of linear attention relative to quadratic attention in this setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Face recognition and analysis