Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling

Zhijie Huang; Stephen McIntosh; Daisuke Saito; Nobuaki Minematsu

arXiv:2602.00594·cs.CL·May 6, 2026

Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling

Zhijie Huang, Stephen McIntosh, Daisuke Saito, Nobuaki Minematsu

PDF

1 Repo 3 Models

TL;DR

Kanade is a novel speech tokenizer that effectively disentangles phonetics and prosody from speaker identity, improving speech modeling and synthesis without auxiliary methods.

Contribution

Introduces Kanade, a simple single-layer disentangled speech tokenizer that captures phonetics and prosody while suppressing speaker information, outperforming existing codecs.

Findings

01

Achieves state-of-the-art speaker disentanglement.

02

Maintains high-quality speech reconstruction.

03

Enhances lexical availability in speech modeling.

Abstract

A good language model starts with a good tokenizer. Tokenization is especially important for speech modeling, which must handle continuous signals that mix linguistic and non-linguistic information. A speech tokenizer should extract phonetics and prosody, suppress linguistically irrelevant information like speaker identity, and enable high-quality synthesis. We present Kanade, a single-layer disentangled speech tokenizer that realizes this ideal. Kanade separates out acoustic constants to create a single stream of tokens that captures rich phonetics and prosody. It does so without the need for auxiliary methods that existing disentangled codecs often rely on. Experiments show that Kanade achieves state-of-the-art speaker disentanglement and lexical availability, while maintaining excellent reconstruction quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frothywater/kanade-tokenizer
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.