OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution
Chong Xia, Fangfu Liu, Yule Wang, Yize Pang, Yueqi Duan

TL;DR
OnlineX is a real-time online 3D reconstruction framework that effectively balances local detail capture and global structure preservation by decoupling active and stable memory states, enabling continuous scene understanding from streaming images.
Contribution
It introduces a novel decoupled active-to-stable state evolution paradigm for online 3D reconstruction, integrating visual and language fields with Gaussian fusion for improved accuracy.
Findings
Outperforms prior methods in novel view synthesis
Achieves robust semantic understanding across sequences
Operates with real-time inference speed
Abstract
Recent advances in generalizable 3D Gaussian Splatting (3DGS) have enabled rapid 3D scene reconstruction within seconds, eliminating the need for per-scene optimization. However, existing methods primarily follow an offline reconstruction paradigm, lacking the capacity for continuous reconstruction, which limits their applicability to online scenarios such as robotics and VR/AR. In this paper, we introduce OnlineX, a feed-forward framework that reconstructs both 3D visual appearance and language fields in an online manner using only streaming images. A key challenge in online formulation is the cumulative drift issue, which is rooted in the fundamental conflict between two opposing roles of the memory state: an active role that constantly refreshes to capture high-frequency local geometry, and a stable role that conservatively accumulates and preserves the long-term global structure. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
