Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling
Shuyang Xiang, Hao Guan

TL;DR
This paper explores using low-resolution visual representations of Chinese characters as an alternative to traditional token IDs, demonstrating comparable accuracy and a hot-start effect in language modeling.
Contribution
It introduces a novel approach of using grayscale images of characters at low resolution for Chinese language modeling, highlighting the effectiveness of visual structure.
Findings
Low-resolution visual inputs achieve 39.2% accuracy, comparable to index-based tokens.
Pronounced hot-start effect observed, with accuracy surpassing 12% early in training.
Visual inputs provide a robust alternative signal for Chinese language modeling.
Abstract
Large language models typically represent Chinese characters as discrete index-based tokens, largely ignoring their visual form. For logographic scripts, visual structure carries semantic and phonetic information, which may aid prediction. We investigate whether low-resolution visual inputs can serve as an alternative for character-level modeling. Instead of token IDs, our decoder receives grayscale images of individual characters, with resolutions as low as 8 x 8 pixels. Remarkably, these inputs achieve 39.2% accuracy, comparable to the index-based baseline of 39.1%. Such low-resource settings also exhibit a pronounced hot-start effect: by 0.4% of total training, accuracy reaches above 12%, while index-based models lag at below 6%. Overall, our results demonstrate that minimal visual structure can provide a robust and efficient signal for Chinese language modeling, offering an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
