PEARL: Geometry Aligns Semantics for Training-Free Open-Vocabulary Semantic Segmentation

Gensheng Pei; Xiruo Jiang; Xinhao Cai; Tao Chen; Yazhou Yao; Byeungwoo Jeon

arXiv:2603.21528·cs.CV·March 24, 2026

PEARL: Geometry Aligns Semantics for Training-Free Open-Vocabulary Semantic Segmentation

Gensheng Pei, Xiruo Jiang, Xinhao Cai, Tao Chen, Yazhou Yao, Byeungwoo Jeon

PDF

Open Access

TL;DR

PEARL is a training-free, efficient method for open-vocabulary semantic segmentation that aligns geometry between text and image features, improving accuracy without additional training or complex models.

Contribution

PEARL introduces a novel, training-free two-step inference method combining geometry alignment and Laplacian propagation for open-vocabulary segmentation.

Findings

01

Sets new state-of-the-art in training-free OVSS benchmarks.

02

Achieves superior performance with minimal latency and no extra data.

03

Operates effectively under both with-background and without-background protocols.

Abstract

Training-free open-vocabulary semantic segmentation (OVSS) promises rapid adaptation to new label sets without retraining. Yet, many methods rely on heavy post-processing or handle text and vision in isolation, leaving cross-modal geometry underutilized. Others introduce auxiliary vision backbones or multi-model pipelines, which increase complexity and latency while compromising design simplicity. We present PEARL, \textbf{\underline{P}}rocrust\textbf{\underline{e}}s \textbf{\underline{a}}lignment with text-awa\textbf{\underline{r}}e \textbf{\underline{L}}aplacian propagation, a compact two-step inference that follows an align-then-propagate principle. The Procrustes alignment step performs an orthogonal projection inside the last self-attention block, rotating keys toward the query subspace via a stable polar iteration. The text-aware Laplacian propagation then refines per-pixel logits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning