SpeLLM: Character-Level Multi-Head Decoding

Amit Ben-Artzy; Roy Schwartz

arXiv:2507.16323·cs.CL·July 23, 2025

SpeLLM: Character-Level Multi-Head Decoding

Amit Ben-Artzy, Roy Schwartz

PDF

Open Access

TL;DR

SpeLLM introduces a character-level multi-head decoding method that decouples input and output vocabularies, enabling larger output spaces and reducing runtime costs in large language models.

Contribution

It proposes a novel multi-head decoding approach that predicts characters independently, allowing for larger vocabularies without increasing model size, and demonstrates effective conversion from standard LLMs.

Findings

01

Achieves 5.1% average runtime reduction across models

02

Maintains competitive performance on downstream tasks

03

Enables support for underrepresented languages and domains

Abstract

Scaling LLM vocabulary is often used to reduce input sequence length and alleviate attention's quadratic cost. Yet, current LLM architectures impose a critical bottleneck to this procedure: the output projection layer scales linearly with vocabulary size, rendering substantial expansion impractical. We propose SpeLLM, a method that decouples input and output vocabularies by predicting character-level strings through multiple output heads. In SpeLLM, each of the $k$ linear heads predicts a single character simultaneously, enabling the model to represent a much larger output space using smaller, independent linear heads. We present a self-distillation approach for converting a standard LLM to a SpeLLM. Our experiments with four pre-trained LLMs show their SpeLLM variants achieve competitive performance on downstream tasks while reducing runtime by 5.1% on average across models. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification