Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

Darshan Fofadiya

arXiv:2512.03343·cs.CL·December 15, 2025

Idea-Gated Transformers: Enforcing Semantic Coherence via Differentiable Vocabulary Pruning

Darshan Fofadiya

PDF

Open Access 1 Models

TL;DR

This paper introduces the Idea-Gated Transformer, which uses a differentiable gating mechanism based on semantic planning to improve topic coherence and domain retention in language generation, addressing the limitations of autoregressive models.

Contribution

The paper presents a novel architecture that separates semantic planning from syntactic generation using an auxiliary Idea Head and a gating mechanism, enhancing controllability and domain retention.

Findings

01

Achieves comparable perplexity to GPT-2 baseline.

02

Significantly improves domain retention and semantic coherence.

03

Effectively locks generation into specific semantic clusters.

Abstract

Autoregressive Language Models (LLMs) trained on Next-Token Prediction (NTP) often suffer from Topic Drift where the generation wanders away from the initial prompt due to a reliance on local associations rather than global planning. While scaling model size mitigates this, the fundamental myopia of the NTP objective remains. In this work, we introduce the Idea-Gated Transformer, a novel architecture that separates semantic planning from syntactic generation. We introduce an auxiliary Idea Head trained to predict the bag-of-words distribution for a future context window, creating a latent ``Concept Vector'' that actively gates the main vocabulary during generation. We propose a differentiable gating mechanism that suppresses semantically irrelevant tokens, effectively pruning the search space in real-time. Experiments on WikiText-103 demonstrate that while the Idea-Gated model achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
DarshanFofadiya/Idea_Gated_Transformers
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning