Leviathan: Decoupling Input and Output Representations in Language Models

Reza T. Batley; Sourav Saha

arXiv:2601.22040·cs.CL·May 8, 2026

Leviathan: Decoupling Input and Output Representations in Language Models

Reza T. Batley, Sourav Saha

PDF

TL;DR

Leviathan introduces a novel Transformer architecture that decouples input and output representations, leading to improved language modeling performance, especially on rare tokens, with minimal additional parameters.

Contribution

The paper presents Leviathan, a new method replacing input embeddings with learned vectorization, enhancing performance over standard tied embeddings with minimal parameter increase.

Findings

01

Leviathan reduces validation perplexity by 9% at 1.2B scale.

02

It requires 2.1 times fewer tokens to reach baseline loss.

03

Achieves a 30% reduction in LAMBADA perplexity.

Abstract

Modern language models use a single matrix for input embedding and output projection. This couples two distinct objectives: token representation and discrimination over a vocabulary. This work introduces Leviathan, a Transformer architecture that replaces the input embedding matrix with learned embedding vectorization (LEV), a compact continuous mapping from token indices to embeddings. Leviathan's output head remains untied for a parameter increase of as low as 0.2%. Under controlled comparisons with identical Transformer backbones, Leviathan consistently improves language modeling performance over standard tied-embedding baselines across a 200M-1.2B parameter regime on The Pile with gains that grow during training. At 1.2B scale, Leviathan reduces validation perplexity by 9%, requires $2.1 \times$ fewer training tokens to reach the tied baseline's final loss, and improves on all six…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.