Does Visual Rendering Bypass Tokenization? Investigating Script-Tokenizer Misalignment in Pixel-Based Language Models

Lucky Susanto; Musa Izzanardi Wijanarko; Khumaisa Nur'aini; Farid Adilazuarda; Alham Fikri Aji; Derry Tanti Wijaya

arXiv:2602.06973·cs.CL·February 10, 2026

Does Visual Rendering Bypass Tokenization? Investigating Script-Tokenizer Misalignment in Pixel-Based Language Models

Lucky Susanto, Musa Izzanardi Wijanarko, Khumaisa Nur'aini, Farid Adilazuarda, Alham Fikri Aji, Derry Tanti Wijaya

PDF

Open Access

TL;DR

This paper investigates whether pixel-based language models truly bypass tokenization issues, finding that reintegrating tokenizers reintroduces misalignment problems, especially affecting low-resource languages with unique scripts.

Contribution

The study reveals that visual rendering does not eliminate tokenization constraints and highlights the importance of tokenizer design for low-resource language modeling.

Findings

01

Reintegrating tokenizers reintroduces misalignment issues.

02

Custom tokenizers outperform standard ones by up to 30.15 chrF++.

03

Visual rendering alone does not solve tokenization barriers.

Abstract

While pixel-based language modeling aims to bypass the sub-word tokenization bottleneck by rendering text as images, recent multimodal variants such as DualGPT reintroduce text tokenizers to improve autoregressive performance. We investigate a fundamental question, does visual rendering truly decouple a model from tokenization constraints? Focusing on four Indonesian low-resource local languages that have their own non-Latin scripts (i.e., Javanese, Balinese, Sundanese, and Lampungnese), we evaluate the impact of script-tokenizer alignment within the DualGPT architecture. Our results show that, despite visual rendering, reintegrating a text tokenizer into the architecture reintroduces the same issue that pixel-based language modeling aims to resolve, which is the tokenizer misalignment problem. Despite having lower OOV and fertility rates, we show that the Llama 2 tokenizer performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Topic Modeling