Learning Asynchronous and Sparse Human-Object Interaction in Videos
Romero Morais, Vuong Le, Svetha Venkatesh, Truyen Tran

TL;DR
This paper introduces ASSIGN, a recurrent graph network that automatically detects and models the asynchronous and sparse interactions in videos, improving human-object interaction recognition without external segmentation.
Contribution
The paper presents a novel graph network model that learns the dynamic, asynchronous, and sparse interactions in videos for human-object activity recognition.
Findings
Superior performance in segmenting and labeling sub-activities.
Eliminates the need for external segmentation.
Effective modeling of asynchronous and sparse interactions.
Abstract
Human activities can be learned from video. With effective modeling it is possible to discover not only the action labels but also the temporal structures of the activities such as the progression of the sub-activities. Automatically recognizing such structure from raw video signal is a new capability that promises authentic modeling and successful recognition of human-object interactions. Toward this goal, we introduce Asynchronous-Sparse Interaction Graph Networks (ASSIGN), a recurrent graph network that is able to automatically detect the structure of interaction events associated with entities in a video scene. ASSIGN pioneers learning of autonomous behavior of video entities including their dynamic structure and their interaction with the coexisting neighbors. Entities' lives in our model are asynchronous to those of others therefore more flexible in adaptation to complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
