PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos
Yihao Wang, Yang Miao, Wenshuai Zhao, Wenyan Yang, Zihan Wang, Joni Pajarinen, Luc Van Gool, Danda Pani Paudel, Juho Kannala, Xi Wang, Arno Solin

TL;DR
PAWS introduces a scalable method to extract object articulation data from egocentric videos, enhancing 3D understanding and robot manipulation without relying on extensive manual annotations.
Contribution
The paper presents PAWS, a novel approach that leverages in-the-wild egocentric videos for articulation perception, reducing dependence on high-quality 3D data and manual labels.
Findings
Achieved significant improvements on HD-EPIC and Arti4D datasets.
Enhanced downstream tasks like 3D articulation prediction and robot manipulation.
Demonstrated scalability and effectiveness of the method.
Abstract
Articulation perception aims to recover the motion and structure of articulated objects (e.g., drawers and cupboards), and is fundamental to 3D scene understanding in robotics, simulation, and animation. Existing learning-based methods rely heavily on supervised training with high-quality 3D data and manual annotations, limiting scalability and diversity. To address this limitation, we propose PAWS, a method that directly extracts object articulations from hand-object interactions in large-scale in-the-wild egocentric videos. We evaluate our method on the public data sets, including HD-EPIC and Arti4D data sets, achieving significant improvements over baselines. We further demonstrate that the extracted articulations benefit downstream tasks, including fine-tuning 3D articulation prediction models and enabling robot manipulation. See the project website at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Hand Gesture Recognition Systems · Human Motion and Animation
