Enhancing Latent Computation in Transformers with Latent Tokens

Yuchang Sun; Yanxi Chen; Yaliang Li; Bolin Ding

arXiv:2505.12629·cs.LG·May 20, 2025

Enhancing Latent Computation in Transformers with Latent Tokens

Yuchang Sun, Yanxi Chen, Yaliang Li, Bolin Ding

PDF

Open Access

TL;DR

This paper introduces latent tokens, a lightweight augmentation for Transformer-based language models that improves performance and out-of-distribution generalization by steering decoding through attention mechanisms.

Contribution

The paper proposes a novel, parameter-efficient method called latent tokens to enhance LLMs, seamlessly integrating with pre-trained models and improving adaptability.

Findings

01

Latent tokens significantly outperform baselines in out-of-distribution tasks.

02

The method can be integrated with pre-trained Transformers with minimal overhead.

03

Synthetic tasks verify the hypotheses about latent tokens' mechanisms.

Abstract

Augmenting large language models (LLMs) with auxiliary tokens has emerged as a promising strategy for enhancing model performance. In this work, we introduce a lightweight method termed latent tokens; these are dummy tokens that may be non-interpretable in natural language but steer the autoregressive decoding process of a Transformer-based LLM via the attention mechanism. The proposed latent tokens can be seamlessly integrated with a pre-trained Transformer, trained in a parameter-efficient manner, and applied flexibly at inference time, while adding minimal complexity overhead to the existing infrastructure of standard Transformers. We propose several hypotheses about the underlying mechanisms of latent tokens and design synthetic tasks accordingly to verify them. Numerical results confirm that the proposed method noticeably outperforms the baselines, particularly in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Explainable Artificial Intelligence (XAI)

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Softmax · Position-Wise Feed-Forward Layer