ScribeTokens: Fixed-Vocabulary Tokenization of Digital Ink

Douglass Wang

arXiv:2603.02805·cs.CV·March 4, 2026

ScribeTokens: Fixed-Vocabulary Tokenization of Digital Ink

Douglass Wang

PDF

Open Access

TL;DR

ScribeTokens introduces a fixed-vocabulary tokenization method for digital ink that improves recognition accuracy and training efficiency, outperforming previous vector-based and token-based methods in handwritten text recognition tasks.

Contribution

The paper proposes ScribeTokens, a fixed-vocabulary tokenization for digital ink that enhances recognition performance and training speed, and introduces a self-supervised pretraining strategy for better results.

Findings

01

ScribeTokens outperforms vector representations in recognition accuracy.

02

Pretraining with next-ink-token prediction improves convergence and accuracy.

03

Achieves state-of-the-art results on IAM and DeepWriting datasets.

Abstract

Digital ink -- the coordinate stream captured from stylus or touch input -- lacks a unified representation. Continuous vector representations produce long sequences and suffer from training instability, while existing token representations require large vocabularies, face out-of-vocabulary issues, and underperform vectors on recognition. We propose ScribeTokens, a tokenization that decomposes pen movement into unit pixel steps. Together with two pen-state tokens, this fixed 10-token base vocabulary suffices to represent any digital ink and enables aggressive BPE compression. On handwritten text generation, ScribeTokens dramatically outperforms vectors (17.33% vs. 70.29% CER), showing tokens are far more effective for generation. On recognition, ScribeTokens is the only token representation to outperform vectors without pretraining. We further introduce next-ink-token prediction as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Interactive and Immersive Displays