Zonkey: A Hierarchical Diffusion Language Model with Differentiable Tokenization and Probabilistic Attention
Alon Rozental

TL;DR
Zonkey introduces a hierarchical diffusion language model with a differentiable tokenizer and probabilistic attention, enabling end-to-end training from raw characters to document-level representations, improving adaptability and scalability of language models.
Contribution
It presents a fully trainable pipeline with a differentiable tokenizer and probabilistic attention, allowing end-to-end optimization of language models from raw text inputs.
Findings
Generates coherent, variable-length text from noise.
Emergent hierarchies align with linguistic structures.
Outperforms entropy-based tokenizers in data distribution alignment.
Abstract
Large language models (LLMs) have revolutionized natural language processing, yet they remain constrained by fixed, non-differentiable tokenizers like Byte Pair Encoding (BPE), which hinder end-to-end optimization and adaptability to noisy or domain-specific data. We introduce Zonkey, a hierarchical diffusion model that addresses these limitations through a fully trainable pipeline from raw characters to document-level representations. At its core is a differentiable tokenizer (Segment Splitter) that learns probabilistic beginning-of-sequence (BOS) decisions, enabling adaptive splits that emerge as linguistically meaningful (e.g., word boundaries at spaces, sentence starts at periods) without explicit supervision. This differentiability is enabled by our novel Probabilistic Attention mechanism, which incorporates position-specific existence probabilities to simulate soft masking over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
