Multilingual Pretraining for Pixel Language Models
Ilker Kesen, Jonas F. Lotz, Ingo Ziegler, Phillip Rust, Desmond Elliott

TL;DR
This paper introduces PIXEL-M4, a multilingual pixel language model pretrained on diverse languages, which outperforms English-only models on various tasks and captures rich linguistic features across languages.
Contribution
The paper presents PIXEL-M4, the first multilingual pixel language model pretrained on multiple languages, demonstrating improved cross-lingual transfer and linguistic feature capture.
Findings
PIXEL-M4 outperforms English-only models on non-Latin scripts.
It captures rich linguistic features even in unseen languages.
Multilingual pretraining aligns semantic spaces across languages.
Abstract
Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual transfer, multilingual pretraining remains underexplored. We introduce PIXEL-M4, a model pretrained on four visually and linguistically diverse languages: English, Hindi, Ukrainian, and Simplified Chinese. Multilingual evaluations on semantic and syntactic tasks show that PIXEL-M4 outperforms an English-only counterpart on non-Latin scripts. Word-level probing analyses confirm that PIXEL-M4 captures rich linguistic features, even in languages not seen during pretraining. Furthermore, an analysis of its hidden representations shows that multilingual pretraining yields a semantic embedding space closely aligned across the languages used for pretraining. This work demonstrates that multilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
