Not All Visitors are Bilingual: A Measurement Study of the Multilingual Web from an Accessibility Perspective
Masudul Hasan Masud Bhuiyan, Matteo Varvello, Yasir Zaki, Cristian-Alexandru Staicu

TL;DR
This study introduces a large dataset of multilingual websites to analyze accessibility issues faced by visually impaired users, revealing widespread neglect of language-specific accessibility support and proposing a new testing extension.
Contribution
The paper presents LangCrUX, a large-scale multilingual web dataset, and Kizuki, an automated accessibility testing tool that improves support for non-Latin scripts.
Findings
Widespread neglect of accessibility hints in multilingual websites
Language-inconsistent hints reduce screen reader effectiveness
Kizuki improves accessibility testing for non-Latin scripts
Abstract
English is the predominant language on the web, powering nearly half of the world's top ten million websites. Support for multilingual content is nevertheless growing, with many websites increasingly combining English with regional or native languages in both visible content and hidden metadata. This multilingualism introduces significant barriers for users with visual impairments, as assistive technologies like screen readers frequently lack robust support for non-Latin scripts and misrender or mispronounce non-English text, compounding accessibility challenges across diverse linguistic contexts. Yet, large-scale studies of this issue have been limited by the lack of comprehensive datasets on multilingual web content. To address this gap, we introduce LangCrUX, the first large-scale dataset of 120,000 popular websites across 12 languages that primarily use non-Latin scripts. Leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
