Is In-Context Universality Enough? MLPs are Also Universal In-Context
Anastasis Kratsios, Takashi Furuya

TL;DR
This paper demonstrates that MLPs are also universal in-context learners, suggesting that transformers' success may be due to factors beyond in-context universality such as inductive bias or training stability.
Contribution
It proves that MLPs with trainable activation functions are also universal in-context, challenging the idea that this property alone explains transformers' effectiveness.
Findings
MLPs are universal in-context learners.
Transformers' advantage may stem from other factors.
In-context universality is not unique to transformers.
Abstract
The success of transformers is often linked to their ability to perform in-context learning. Recent work shows that transformers are universal in context, capable of approximating any real-valued continuous function of a context (a probability measure over ) and a query . This raises the question: Does in-context universality explain their advantage over classical models? We answer this in the negative by proving that MLPs with trainable activation functions are also universal in-context. This suggests the transformer's success is likely due to other factors like inductive bias or training stability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution
