DetAny4D: Detect Anything 4D Temporally in a Streaming RGB Video
Jiawei Hou, Shenghao Zhang, Can Wang, Zheng Gu, Yonggen Ling, Taiping Zeng, Xiangyang Xue, Jingbo Zhang

TL;DR
DetAny4D introduces a new large-scale 4D detection dataset and a novel end-to-end framework that fuses multi-modal features and models temporal dynamics to improve accuracy and stability in streaming video object detection.
Contribution
The paper presents DA4D, a large-scale dataset, and DetAny4D, an innovative framework that enhances 4D object detection by modeling temporal consistency and integrating multi-modal features.
Findings
Achieves competitive detection accuracy.
Significantly improves temporal stability.
Addresses jitter and inconsistency issues.
Abstract
Reliable 4D object detection, which refers to 3D object detection in streaming video, is crucial for perceiving and understanding the real world. Existing open-set 4D object detection methods typically make predictions on a frame-by-frame basis without modeling temporal consistency, or rely on complex multi-stage pipelines that are prone to error propagation across cascaded stages. Progress in this area has been hindered by the lack of large-scale datasets that capture continuous reliable 3D bounding box (b-box) annotations. To overcome these challenges, we first introduce DA4D, a large-scale 4D detection dataset containing over 280k sequences with high-quality b-box annotations collected under diverse conditions. Building on DA4D, we propose DetAny4D, an open-set end-to-end framework that predicts 3D b-boxes directly from sequential inputs. DetAny4D fuses multi-modal features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Face recognition and analysis
