OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

Wenyuan Huang; Zhao Wang; Zhou Wei; Ting Huang; Fang Zhao; Jian Yang; Zhenyu Zhang

arXiv:2512.23020·cs.CV·January 1, 2026

OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

Wenyuan Huang, Zhao Wang, Zhou Wei, Ting Huang, Fang Zhao, Jian Yang, Zhenyu Zhang

PDF

Open Access 1 Datasets

TL;DR

OpenGround introduces a zero-shot framework with an Active Cognition-based Reasoning module that enhances 3D visual grounding in open-world scenarios by dynamically expanding the model's understanding beyond predefined object categories.

Contribution

The paper proposes OpenGround, a novel open-world 3D visual grounding method with an Active Cognition-based Reasoning module that overcomes the limitations of pre-defined object lookup tables.

Findings

01

Achieves competitive performance on Nr3D

02

State-of-the-art on ScanRefer

03

17.6% improvement on OpenTarget

Abstract

3D visual grounding aims to locate objects based on natural language descriptions in 3D scenes. Existing methods rely on a pre-defined Object Lookup Table (OLT) to query Visual Language Models (VLMs) for reasoning about object locations, which limits the applications in scenarios with undefined or unforeseen targets. To address this problem, we present OpenGround, a novel zero-shot framework for open-world 3D visual grounding. Central to OpenGround is the Active Cognition-based Reasoning (ACR) module, which is designed to overcome the fundamental limitation of pre-defined OLTs by progressively augmenting the cognitive scope of VLMs. The ACR module performs human-like perception of the target via a cognitive task chain and actively reasons about contextually relevant objects, thereby extending VLM cognition through a dynamically updated OLT. This allows OpenGround to function with both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Inmny/OpenTarget
dataset· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Human Pose and Action Recognition