Multilingual Pretraining for Pixel Language Models

Ilker Kesen; Jonas F. Lotz; Ingo Ziegler; Phillip Rust; Desmond Elliott

arXiv:2505.21265·cs.CL·December 3, 2025

Multilingual Pretraining for Pixel Language Models

Ilker Kesen, Jonas F. Lotz, Ingo Ziegler, Phillip Rust, Desmond Elliott

PDF

Open Access 1 Video

TL;DR

This paper introduces PIXEL-M4, a multilingual pixel language model pretrained on diverse languages, which outperforms English-only models on various tasks and captures rich linguistic features across languages.

Contribution

The paper presents PIXEL-M4, the first multilingual pixel language model pretrained on multiple languages, demonstrating improved cross-lingual transfer and linguistic feature capture.

Findings

01

PIXEL-M4 outperforms English-only models on non-Latin scripts.

02

It captures rich linguistic features even in unseen languages.

03

Multilingual pretraining aligns semantic spaces across languages.

Abstract

Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual transfer, multilingual pretraining remains underexplored. We introduce PIXEL-M4, a model pretrained on four visually and linguistically diverse languages: English, Hindi, Ukrainian, and Simplified Chinese. Multilingual evaluations on semantic and syntactic tasks show that PIXEL-M4 outperforms an English-only counterpart on non-Latin scripts. Word-level probing analyses confirm that PIXEL-M4 captures rich linguistic features, even in languages not seen during pretraining. Furthermore, an analysis of its hidden representations shows that multilingual pretraining yields a semantic embedding space closely aligned across the languages used for pretraining. This work demonstrates that multilingual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multilingual Pretraining for Pixel Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling