Is In-Context Universality Enough? MLPs are Also Universal In-Context

Anastasis Kratsios; Takashi Furuya

arXiv:2502.03327·stat.ML·February 6, 2025

Is In-Context Universality Enough? MLPs are Also Universal In-Context

Anastasis Kratsios, Takashi Furuya

PDF

Open Access

TL;DR

This paper demonstrates that MLPs are also universal in-context learners, suggesting that transformers' success may be due to factors beyond in-context universality such as inductive bias or training stability.

Contribution

It proves that MLPs with trainable activation functions are also universal in-context, challenging the idea that this property alone explains transformers' effectiveness.

Findings

01

MLPs are universal in-context learners.

02

Transformers' advantage may stem from other factors.

03

In-context universality is not unique to transformers.

Abstract

The success of transformers is often linked to their ability to perform in-context learning. Recent work shows that transformers are universal in context, capable of approximating any real-valued continuous function of a context (a probability measure over $X \subseteq R^{d}$ ) and a query $x \in X$ . This raises the question: Does in-context universality explain their advantage over classical models? We answer this in the negative by proving that MLPs with trainable activation functions are also universal in-context. This suggests the transformer's success is likely due to other factors like inductive bias or training stability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution