WorldComp2D: Spatio-semantic Representations of Object Identity and Location from Local Views

SeongMin Jin; Doo Seok Jeong

arXiv:2605.11743·cs.CV·May 13, 2026

WorldComp2D: Spatio-semantic Representations of Object Identity and Location from Local Views

SeongMin Jin, Doo Seok Jeong

PDF

1 Repo

TL;DR

WorldComp2D introduces a lightweight, explicitly structured latent space framework for efficient spatio-semantic reasoning, demonstrated through facial landmark localization with reduced computational costs.

Contribution

It proposes a novel framework that explicitly structures latent space geometry based on object identity and spatial proximity, improving efficiency over existing methods.

Findings

01

Reduces parameters and FLOPs by up to 4.0X and 2.2X respectively.

02

Maintains real-time CPU performance.

03

Demonstrates effectiveness in facial landmark localization.

Abstract

Learning latent representations that capture both semantic and spatial information is central to efficient spatio-semantic reasoning. However, many existing approaches rely on implicit latent structures combined with dense feature maps or task-specific heads, limiting computational efficiency and flexibility. We propose WorldComp2D, a novel lightweight representation learning framework that explicitly structures latent space geometry according to object identity and spatial proximity using multiscale local receptive fields. This framework consists of (i) a proximity-dependent encoder that maps a given observation into a spatio-semantic latent space and (ii) a localizer that infers the coordinates of objects in the input from the resulting spatio-semantic representation. Using facial landmark localization as a proof-of-concept, we show that, compared to SoTA lightweight models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JinSeongmin/WorldComp2D
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.