Ask, and it shall be given: On the Turing completeness of prompting

Ruizhong Qiu; Zhe Xu; Wenxuan Bao; Hanghang Tong

arXiv:2411.01992·cs.LG·February 24, 2025·2 cites

Ask, and it shall be given: On the Turing completeness of prompting

Ruizhong Qiu, Zhe Xu, Wenxuan Bao, Hanghang Tong

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper provides a theoretical foundation showing that prompting enables a finite-size Transformer to simulate any computable function, establishing its Turing completeness and universality in the context of large language models.

Contribution

It is the first to theoretically demonstrate that prompting can make a finite Transformer Turing-complete and nearly as powerful as unbounded models.

Findings

01

Prompting makes a finite Transformer Turing-complete.

02

A single finite Transformer can simulate any computable function.

03

Prompting enables universal computation with bounded models.

Abstract

Since the success of GPT, large language models (LLMs) have been revolutionizing machine learning and have initiated the so-called LLM prompting paradigm. In the era of LLMs, people train a single general-purpose LLM and provide the LLM with different prompts to perform different tasks. However, such empirical success largely lacks theoretical understanding. Here, we present the first theoretical study on the LLM prompting paradigm to the best of our knowledge. In this work, we show that prompting is in fact Turing-complete: there exists a finite-size Transformer such that for any computable function, there exists a corresponding prompt following which the Transformer computes the function. Furthermore, we show that even though we use only a single finite-size Transformer, it can still achieve nearly the same complexity bounds as that of the class of all unbounded-size Transformers.…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The computational power of Neural Networks, and especially of Transformers, studied in this work, is an important and interesting topic that has been studied for some time. The authors have shown new results in this field and improved previously known achievements, in particular those of Perez et al. (ICLR 2019 and JMLR 2021) and Merrill & Sabharwal (ICLR, 2024). While Perez et al. (2021) have shown that for any computable function $\varphi$ there exists a Transformer that computes the function,

Weaknesses

The topics discussed in the paper and the methods of proofs are not entirely new, and the main result, although it sheds new light on the issue of computability, is not groundbreaking in this field. Moreover, to claim that this work proves Turing completeness of prompting is in some sense an over interpretation of the achievements. First, although it is justified by the results of Hahn (2020) and others, a rather negative aspect of the results of this work is hidden in the fact that autoregressi

Reviewer 02Rating 5Confidence 2

Strengths

This paper is clearly written, and the technical details appear to be sound. The construction of the particular Transformer for simulating 2-PTMs is clever.

Weaknesses

I am an outsider to this area, and am not confident in this evaluation. However, I found it difficult to understand the significance of this result. We, of course, already know that it is possible to unambiguously specify computations with strings in a finite alphabet. We also know that finite machines with infinite tapes can execute these computations. In some sense the "prompting paradigm" (use a string to tell a machine what we want it to do) is also just the "programming paradigm." So the

Reviewer 03Rating 5Confidence 3

Strengths

+ The paper presents a new reduction that proves Turing-completeness of transformers with hardmax attention. + The authors deliver a thorough construction and proof for their Turing-completeness result.

Weaknesses

- The paper lacks sufficient justification for the novelty of its results. The Turing-completeness of transformers with prompting appears to follow naturally from existing work. It’s well-known that a Universal Turing Machine (UTM) can simulate any Turing machine by encoding the machine as part of the input. Given prior work [1], which shows that transformers with hard attention are Turing-complete, it’s intuitive that a family of transformers can simulate a UTM. Consequently, encoding any Turin

Code & Models

Repositories

q-rz/iclr25-prompting-theory
noneOfficial

Videos

Ask, and it shall be given: On the Turing completeness of prompting· slideslive

Taxonomy

TopicsComputability, Logic, AI Algorithms

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Position-Wise Feed-Forward Layer · Adam · Attention Dropout · Multi-Head Attention · Weight Decay · Byte Pair Encoding