UIPress: Bringing Optical Token Compression to UI-to-Code Generation

Dasen Dai; Shuoqi Li; Ronghao Chen; Huacan Wang; Biao Wu; Qizhen Lan

arXiv:2604.09442·cs.CL·April 13, 2026

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

Dasen Dai, Shuoqi Li, Ronghao Chen, Huacan Wang, Biao Wu, Qizhen Lan

PDF

TL;DR

UIPress introduces a learned optical compression module for UI-to-Code generation, significantly reducing token count and latency while outperforming existing methods on design tasks.

Contribution

It is the first encoder-side learned compression approach for UI-to-Code, combining novel techniques to efficiently compress visual tokens with minimal additional parameters.

Findings

01

Achieves a 9.1× speedup in time-to-first-token.

02

Outperforms baseline models with a 7.5% higher CLIP score.

03

Compresses approximately 6,700 visual tokens to 256 tokens.

Abstract

UI-to-Code generation requires vision-language models (VLMs) to produce thousands of tokens of structured HTML/CSS from a single screenshot, making visual token efficiency critical. Existing compression methods either select tokens at inference time using task-agnostic heuristics, or zero out low-attention features without actually shortening the sequence -- neither truly reduces prefill latency or adapts to the non-uniform information density of UI screenshots. Meanwhile, optical (encoder-side learned) compression has shown strong results for document OCR, yet no prior work has adapted this paradigm to UI-to-Code generation. We propose UIPress, a lightweight learned compression module inserted between the frozen ViT encoder and the LLM decoder of Qwen3-VL-8B. UIPress combines depthwise-separable convolutions, element-guided spatial reweighting, and Transformer refinement to compress…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.