TL;DR
This paper introduces a variational graph autoencoder that jointly recognizes and predicts manipulation actions from symbolic scene graphs, offering an efficient alternative to CNN-based methods for understanding human activities in robotics.
Contribution
The novel deep graph autoencoder processes symbolic scene graphs for manipulation action recognition and prediction, outperforming existing CNN-based approaches.
Findings
Achieves better performance than state-of-the-art methods on MANIAC and MSRC-9 datasets.
Utilizes a variational autoencoder structure with dual branches for recognition and prediction.
Operates on semantic graphs, reducing computational complexity compared to Euclidean data processing.
Abstract
Despite decades of research, understanding human manipulation activities is, and has always been, one of the most attractive and challenging research topics in computer vision and robotics. Recognition and prediction of observed human manipulation actions have their roots in the applications related to, for instance, human-robot interaction and robot learning from demonstration. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. These networks, however, come with immense computational complexity to be able to process high dimensional raw data. Different from the related works, we here introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs, instead of relying on the structured Euclidean data. Our network has a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
