Fine-grained activity recognition for assembly videos

Jonathan D. Jones; Cathryn Cortesa; Amy Shelton; Barbara Landau,; Sanjeev Khudanpur; and Gregory D. Hager

arXiv:2012.01392·cs.CV·December 3, 2020

Fine-grained activity recognition for assembly videos

Jonathan D. Jones, Cathryn Cortesa, Amy Shelton, Barbara Landau,, Sanjeev Khudanpur, and Gregory D. Hager

PDF

TL;DR

This paper introduces a comprehensive framework for fine-grained assembly action recognition that integrates spatial details and kinematic structures, demonstrating significant improvements on furniture and block-building datasets.

Contribution

It unifies assembly actions and kinematic structures into a single recognition framework and develops features that leverage spatial assembly structures.

Findings

01

Achieved 70% framewise accuracy on furniture assembly data.

02

Reduced normalized edit distance by 69% on block-building data.

03

Demonstrated the effectiveness of spatially-aware features in fine-grained recognition.

Abstract

In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires perception at a level of spatial detail that has not been attempted in the action recognition literature to date. We extend the fine-grained activity recognition setting to address the task of assembly action recognition in its full generality by unifying assembly actions and kinematic structures within a single framework. We use this framework to develop a general method for recognizing assembly actions from observation sequences, along with observation features that take advantage of a spatial assembly's special structure. Finally, we evaluate our method empirically on two application-driven data sources: (1) An IKEA furniture-assembly dataset, and (2) A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.