ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments

Shiyi Ding; Shaoen Wu; Ying Chen

arXiv:2603.06648·cs.CV·March 10, 2026

ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments

Shiyi Ding, Shaoen Wu, Ying Chen

PDF

Open Access 1 Video

TL;DR

This paper introduces a new dataset and framework for detecting object state changes in VR environments from egocentric views, addressing background changes and lack of benchmarks, and demonstrating superior performance over baselines.

Contribution

The paper presents ObjChangeVR-Dataset and ObjChangeVR framework, enabling effective reasoning about object state changes from continuous VR views with multi-view and temporal reasoning.

Findings

01

ObjChangeVR outperforms baseline methods on the new benchmark.

02

The framework effectively identifies relevant frames and reasons across multiple viewpoints.

03

Extensive experiments validate the approach's robustness and accuracy.

Abstract

Recent advances in multimodal large language models (MLLMs) offer a promising approach for natural language-based scene change queries in virtual reality (VR). Prior work on applying MLLMs for object state understanding has focused on egocentric videos that capture the camera wearer's interactions with objects. However, object state changes may occur in the background without direct user interaction, lacking explicit motion cues and making them difficult to detect. Moreover, no benchmark exists for evaluating this challenging scenario. To address these challenges, we introduce ObjChangeVR-Dataset, specifically for benchmarking the question-answering task of object state change. We also propose ObjChangeVR, a framework that combines viewpoint-aware and temporal-based retrieval to identify relevant frames, along with cross-view reasoning that reconciles inconsistent evidence from multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition