In-Context Deep Learning via Transformer Models

Weimin Wu; Maojiang Su; Jerry Yao-Chieh Hu; Zhao Song; Han Liu

arXiv:2411.16549·cs.LG·April 15, 2025

In-Context Deep Learning via Transformer Models

Weimin Wu, Maojiang Su, Jerry Yao-Chieh Hu, Zhao Song, Han Liu

PDF

Open Access 1 Video

TL;DR

This paper demonstrates how transformer models can simulate the training process of deep neural networks through in-context learning, providing theoretical guarantees and validating with synthetic experiments.

Contribution

It presents a constructive method for transformers to perform in-context gradient descent training of deep networks, with theoretical analysis and practical extensions.

Findings

01

Transformers can simulate gradient descent training of deep networks.

02

Theoretical guarantees for approximation accuracy and convergence.

03

ICL performance matches direct training on synthetic datasets.

Abstract

We investigate the transformer's capability to simulate the training process of deep models via in-context learning (ICL), i.e., in-context deep learning. Our key contribution is providing a positive example of using a transformer to train a deep neural network by gradient descent in an implicit fashion via ICL. Specifically, we provide an explicit construction of a $(2 N + 4) L$ -layer transformer capable of simulating $L$ gradient descent steps of an $N$ -layer ReLU network through ICL. We also give the theoretical guarantees for the approximation within any given error and the convergence of the ICL gradient descent. Additionally, we extend our analysis to the more practical setting using Softmax-based transformers. We validate our findings on synthetic datasets for 3-layer, 4-layer, and 6-layer neural networks. The results show that ICL performance matches that of direct training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

In-Context Deep Learning via Transformer Models· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI

Methods*Communicated@Fast*How Do I Communicate to Expedia?