# Enhancing Robustness of Autoregressive Language Models against Orthographic Attacks via Pixel-based Approach

**Authors:** Han Yang, Jian Lan, Yihong Liu, Hinrich Sch\"utze, Thomas Seidl

arXiv: 2508.21206 · 2025-09-01

## TL;DR

This paper introduces a pixel-based generative language model that enhances robustness against orthographic attacks and supports multilingual text by rendering words as images, addressing vulnerabilities of traditional subword tokenizers.

## Contribution

The paper presents a novel pixel-based approach that replaces text embeddings with images, improving robustness to noise and enabling multilingual support in language models.

## Key findings

- Demonstrates increased robustness to orthographic noise.
- Shows effectiveness in multilingual text processing.
- Achieves competitive performance on benchmark datasets.

## Abstract

Autoregressive language models are vulnerable to orthographic attacks, where input text is perturbed with characters from multilingual alphabets, leading to substantial performance degradation. This vulnerability primarily stems from the out-of-vocabulary issue inherent in subword tokenizers and their embeddings. To address this limitation, we propose a pixel-based generative language model that replaces the text-based embeddings with pixel-based representations by rendering words as individual images. This design provides stronger robustness to noisy inputs, while an extension of compatibility to multilingual text across diverse writing systems. We evaluate the proposed method on the multilingual LAMBADA dataset, WMT24 dataset and the SST-2 benchmark, demonstrating both its resilience to orthographic noise and its effectiveness in multilingual settings.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21206/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21206/full.md

---
Source: https://tomesphere.com/paper/2508.21206