3D Semantic Trajectory Reconstruction from 3D Pixel Continuum
Jae Shin Yoon, Ziwei Li, Hyun Soo Park

TL;DR
This paper introduces a novel method for reconstructing dense 3D semantic trajectories of human interactions from multiple synchronized videos, effectively handling occlusions and appearance changes to produce accurate semantic labeling.
Contribution
The paper proposes a new 3D semantic map representation and a view-pooling based reasoning method to improve semantic trajectory reconstruction from multiple views.
Findings
Outperforms baseline methods in predictive validity.
Provides robust semantic labels for large-scale real-world trajectories.
Effectively handles occlusion and appearance variations.
Abstract
This paper presents a method to reconstruct dense semantic trajectory stream of human interactions in 3D from synchronized multiple videos. The interactions inherently introduce self-occlusion and illumination/appearance/shape changes, resulting in highly fragmented trajectory reconstruction with noisy and coarse semantic labels. Our conjecture is that among many views, there exists a set of views that can confidently recognize the visual semantic label of a 3D trajectory. We introduce a new representation called 3D semantic map---a probability distribution over the semantic labels per trajectory. We construct the 3D semantic map by reasoning about visibility and 2D recognition confidence based on view-pooling, i.e., finding the view that best represents the semantics of the trajectory. Using the 3D semantic map, we precisely infer all trajectory labels jointly by considering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
