VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning

Vivek Madhavaram; Vartika Sengar; Arkadipta De; Charu Sharma

arXiv:2602.00637·cs.CV·February 3, 2026

VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning

Vivek Madhavaram, Vartika Sengar, Arkadipta De, Charu Sharma

PDF

Open Access

TL;DR

VIZOR is a training-free, end-to-end framework that generates viewpoint-invariant 3D scene graphs with open-vocabulary relationships, improving generalization and accuracy in scene understanding and reasoning tasks.

Contribution

VIZOR introduces a novel zero-shot, viewpoint-invariant scene graph generation method directly from raw 3D data, without requiring training or annotated relationships.

Findings

01

Outperforms state-of-the-art in scene graph generation

02

Achieves 22% and 4.81% improvements in zero-shot grounding accuracy on two datasets

03

Provides consistent spatial relationships regardless of viewpoint

Abstract

Scene understanding and reasoning has been a fundamental problem in 3D computer vision, requiring models to identify objects, their properties, and spatial or comparative relationships among the objects. Existing approaches enable this by creating scene graphs using multiple inputs such as 2D images, depth maps, object labels, and annotated relationships from specific reference view. However, these methods often struggle with generalization and produce inaccurate spatial relationships like "left/right", which become inconsistent across different viewpoints. To address these limitations, we propose Viewpoint-Invariant Zero-shot scene graph generation for 3D scene Reasoning (VIZOR). VIZOR is a training-free, end-to-end framework that constructs dense, viewpoint-invariant 3D scene graphs directly from raw 3D scenes. The generated scene graph is unambiguous, as spatial relationships are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning