Understanding 3D Object Articulation in Internet Videos
Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David F. Fouhey

TL;DR
This paper introduces a method for detecting and characterizing 3D planar articulation of objects in ordinary videos by combining detection and optimization, trained on videos and 3D scans, achieving strong results on challenging datasets.
Contribution
It presents a novel approach that integrates top-down detection with optimization to analyze 3D object articulation in videos, trained on diverse datasets.
Findings
Effective detection of 3D object articulation in challenging videos
Strong performance on Internet and Charades datasets
Combines detection and optimization for accurate 3D plane estimation
Abstract
We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos. While seemingly easy for humans, this problem poses many challenges for computers. We propose to approach this problem by combining a top-down detection system that finds planes that can be articulated along with an optimization approach that solves for a 3D plane that can explain a sequence of observed articulations. We show that this system can be trained on a combination of videos and 3D scan datasets. When tested on a dataset of challenging Internet videos and the Charades dataset, our approach obtains strong performance. Project site: https://jasonqsy.github.io/Articulation3D
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods
