Folded Context Condensation in Path Integral Formalism for Infinite   Context Transformers

Won-Gi Paeng; Daesuk Kwon; Kyungwon Jeong; Honggyo Suh

arXiv:2405.04620·hep-ph·May 2, 2025

Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers

Won-Gi Paeng, Daesuk Kwon, Kyungwon Jeong, Honggyo Suh

PDF

Open Access

TL;DR

This paper introduces a novel Transformer formulation based on Path Integral formalism, leading to more efficient long-term information retention and linear memory scaling, validated through retrieval and summarization tasks.

Contribution

It reinterprets Transformer components within Path Integral formalism, resulting in a more compact, memory-efficient architecture for long sequences.

Findings

01

Memory usage scales linearly with sequence length

02

Preserves historical information effectively

03

Outperforms standard attention in long-term retention

Abstract

In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Cognitive Computing and Networks · Cellular Automata and Applications

MethodsAttention Is All You Need · Softmax · Dropout · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Linear Layer · Layer Normalization · Label Smoothing · Residual Connection