ALPINE: Unveiling the Planning Capability of Autoregressive Learning in   Language Models

Siwei Wang; Yifei Shen; Shi Feng; Haoran Sun; Shang-Hua Teng; Wei Chen

arXiv:2405.09220·cs.LG·November 12, 2024·1 cites

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Siwei Wang, Yifei Shen, Shi Feng, Haoran Sun, Shang-Hua Teng, Wei Chen

PDF

Open Access 1 Video

TL;DR

This paper investigates how Transformer-based large language models can develop planning abilities through their next-word prediction mechanism, modeling planning as a path-finding task and analyzing their capacity to learn adjacency and reachability matrices.

Contribution

The paper provides a theoretical framework showing that Transformers can perform path-finding by embedding graph matrices in their weights and learns these matrices through gradient-based training.

Findings

01

Transformers can embed adjacency and reachability matrices within their weights.

02

They learn adjacency and limited reachability matrices through training.

03

Current architectures cannot infer reachability through transitivity, limiting path concatenation.

Abstract

Planning is a crucial element of both human intelligence and contemporary large language models (LLMs). In this paper, we initiate a theoretical investigation into the emergence of planning capabilities in Transformer-based LLMs via their next-word prediction mechanisms. We model planning as a network path-finding task, where the objective is to generate a valid path from a specified source node to a designated target node. Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices. These theoretical insights are then validated through experiments, which demonstrate that Transformer architectures indeed learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Position-Wise Feed-Forward Layer · Dropout · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding