MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts
Chen Hu, Yintao Tai, Antonio Vergari, Frank Keller, Alessandro Suglia

TL;DR
MIXAR is a novel pixel-based multilingual language model trained on eight languages, demonstrating improved performance and robustness across various tasks and scales, offering an alternative to tokenization methods.
Contribution
Introduces MIXAR, the first generative pixel-based language model for multiple languages, showing significant performance gains and robustness over previous models.
Findings
MIXAR outperforms previous pixel-based and tokenizer-based models on multilingual tasks.
Scaling to 0.5B parameters enhances generative and robustness capabilities.
MIXAR remains effective on languages unseen during training.
Abstract
Pixel-based language models are gaining momentum as alternatives to traditional token-based approaches, promising to circumvent tokenization challenges. However, the inherent perceptual diversity across languages poses a significant hurdle for multilingual generalization in pixel space. This paper introduces MIXAR, the first generative pixel-based language model trained on eight different languages utilizing a range of different scripts. We empirically evaluate MIXAR against previous pixel-based models as well as comparable tokenizer-based models, demonstrating substantial performance improvement on discriminative and generative multilingual tasks. Additionally, we show how MIXAR is robust to languages never seen during the training. These results are further strengthened when scaling the model to 0.5B parameters which not only improves its capabilities in generative tasks like LAMBADA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
