Fine-grained activity recognition for assembly videos
Jonathan D. Jones, Cathryn Cortesa, Amy Shelton, Barbara Landau,, Sanjeev Khudanpur, and Gregory D. Hager

TL;DR
This paper introduces a comprehensive framework for fine-grained assembly action recognition that integrates spatial details and kinematic structures, demonstrating significant improvements on furniture and block-building datasets.
Contribution
It unifies assembly actions and kinematic structures into a single recognition framework and develops features that leverage spatial assembly structures.
Findings
Achieved 70% framewise accuracy on furniture assembly data.
Reduced normalized edit distance by 69% on block-building data.
Demonstrated the effectiveness of spatially-aware features in fine-grained recognition.
Abstract
In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires perception at a level of spatial detail that has not been attempted in the action recognition literature to date. We extend the fine-grained activity recognition setting to address the task of assembly action recognition in its full generality by unifying assembly actions and kinematic structures within a single framework. We use this framework to develop a general method for recognizing assembly actions from observation sequences, along with observation features that take advantage of a spatial assembly's special structure. Finally, we evaluate our method empirically on two application-driven data sources: (1) An IKEA furniture-assembly dataset, and (2) A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
