Semformer: Transformer Language Models with Semantic Planning

Yongjing Yin; Junran Ding; Kai Song; Yue Zhang

arXiv:2409.11143·cs.CL·October 28, 2024

Semformer: Transformer Language Models with Semantic Planning

Yongjing Yin, Junran Ding, Kai Song, Yue Zhang

PDF

Open Access

TL;DR

Semformer introduces semantic planning into Transformer language models, improving their ability to predict responses accurately and mitigating shortcut learning, with strong results in minimal planning tasks and downstream NLP applications.

Contribution

The paper proposes a novel training method for Transformers that explicitly models semantic planning, addressing shortcut learning and enhancing performance on various tasks.

Findings

01

Near-perfect performance in graph path-finding task

02

Effective mitigation of shortcut learning

03

Improved perplexity and in-context learning results

Abstract

Next-token prediction serves as the dominant component in current neural language models. During the training phase, the model employs teacher forcing, which predicts tokens based on all preceding ground truth tokens. However, this approach has been found to create shortcuts, utilizing the revealed prefix to spuriously fit future tokens, potentially compromising the accuracy of the next-token predictor. In this paper, we introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response. Specifically, we incorporate a sequence of planning tokens into the prefix, guiding the planning token representations to predict the latent semantic representations of the response, which are induced by an autoencoder. In a minimal planning task (i.e., graph path-finding), our model exhibits near-perfect performance and effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Dropout