Joint Discovery of Object States and Manipulation Actions

Jean-Baptiste Alayrac; Josev Sivic; Ivan Laptev; Simon Lacoste-Julien

arXiv:1702.02738·cs.CV·August 29, 2017·6 cites

Joint Discovery of Object States and Manipulation Actions

Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien

PDF

Open Access 1 Repo

TL;DR

This paper presents a joint model that automatically discovers object states and manipulation actions from videos, improving understanding of object transformations without extra supervision.

Contribution

It introduces a novel joint learning framework that simultaneously identifies object states and actions in videos, leveraging temporal order constraints and new optimization techniques.

Findings

01

Discovered seven manipulation actions and object states on a new real-life video dataset.

02

Joint modeling improves accuracy of object state discovery and action recognition.

03

The approach operates without additional supervision, relying on temporal consistency.

Abstract

Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions. Our model is formulated as a discriminative clustering cost with constraints. We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision. We demonstrate successful discovery of seven manipulation actions and corresponding object states on a new dataset of videos depicting real-life object manipulations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jalayrac/object-states-action
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods