TL;DR
Fusion-GCN introduces a multimodal graph convolutional network that effectively integrates various sensor data types for improved action recognition, demonstrating significant performance gains on benchmark datasets.
Contribution
It is the first to incorporate multiple sensor modalities into a GCN framework for multimodal action recognition, enhancing accuracy over existing methods.
Findings
Achieved comparable results on UTD-MHAD dataset.
Improved MMACT dataset performance by up to 12.37% F1-Measure.
Demonstrated flexible fusion of RGB, inertial, and skeleton data.
Abstract
In this paper, we present Fusion-GCN, an approach for multimodal action recognition using Graph Convolutional Networks (GCNs). Action recognition methods based around GCNs recently yielded state-of-the-art performance for skeleton-based action recognition. With Fusion-GCN, we propose to integrate various sensor data modalities into a graph that is trained using a GCN model for multi-modal action recognition. Additional sensor measurements are incorporated into the graph representation, either on a channel dimension (introducing additional node attributes) or spatial dimension (introducing new nodes). Fusion-GCN was evaluated on two public available datasets, the UTD-MHAD- and MMACT datasets, and demonstrates flexible fusion of RGB sequences, inertial measurements and skeleton sequences. Our approach gets comparable results on the UTD-MHAD dataset and improves the baseline on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGraph Convolutional Network
