Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation
Viktor Muryn, Marta Sumyk, Mariya Hirna, Sofiya Garkot, Maksym Shamrai

TL;DR
Screen2AX is a novel vision-based framework that automatically generates real-time, hierarchical accessibility metadata from macOS screenshots, significantly enhancing AI agents' ability to interpret complex desktop interfaces.
Contribution
It introduces the first method to create full hierarchical accessibility metadata from screenshots and provides datasets and benchmarks for macOS accessibility research.
Findings
Achieves 77% F1 score in reconstructing accessibility trees.
Delivers 2.2x performance improvement over native accessibility.
Outperforms state-of-the-art systems on relevant benchmarks.
Abstract
Desktop accessibility metadata enables AI agents to interpret screens and supports users who depend on tools like screen readers. Yet, many applications remain largely inaccessible due to incomplete or missing metadata provided by developers - our investigation shows that only 33% of applications on macOS offer full accessibility support. While recent work on structured screen representation has primarily addressed specific challenges, such as UI element detection or captioning, none has attempted to capture the full complexity of desktop interfaces by replicating their entire hierarchical structure. To bridge this gap, we introduce Screen2AX, the first framework to automatically create real-time, tree-structured accessibility metadata from a single screenshot. Our method uses vision-language and object detection models to detect, describe, and organize UI elements hierarchically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Accessibility for Disabilities · Tactile and Sensory Interactions · Subtitles and Audiovisual Media
