Learning Human Activities and Object Affordances from RGB-D Videos
Hema Swetha Koppula, Rudhir Gupta, Ashutosh Saxena

TL;DR
This paper presents a method for jointly recognizing human sub-activities and object affordances from RGB-D videos using a Markov random field and SSVM, enabling robots to understand and assist in human environments.
Contribution
It introduces a novel joint modeling approach for activities and affordances with a structural SVM framework, improving interpretability and accuracy.
Findings
Achieved 79.4% accuracy in affordance recognition.
Obtained 63.4% accuracy in sub-activity recognition.
Demonstrated robot-assisted tasks using the descriptive labels.
Abstract
Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RGB-D video, we jointly model the human activities and object affordances as a Markov random field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural support vector machine (SSVM) approach, where labelings over various alternate temporal segmentations are considered as latent variables. We tested our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Anomaly Detection Techniques and Applications
