Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition
Yuhang Wen, Zixuan Tang, Yunsheng Pang, Beichen Ding, Mengyuan Liu

TL;DR
The paper introduces ISTA-Net, a novel neural network that models spatial, temporal, and interactive relations in skeleton-based action recognition, outperforming existing methods and handling diverse interacting entities effectively.
Contribution
It proposes a unified spatiotemporal token attention network with entity rearrangement for better interactive action recognition.
Findings
Outperforms state-of-the-art methods on four datasets.
Effectively models diverse interacting entities.
Demonstrates robustness in recognizing interactive actions.
Abstract
Recognizing interactive action plays an important role in human-robot interaction and collaboration. Previous methods use late fusion and co-attention mechanism to capture interactive relations, which have limited learning capability or inefficiency to adapt to more interacting entities. With assumption that priors of each entity are already known, they also lack evaluations on a more general setting addressing the diversity of subjects. To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations. Specifically, our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. By extending the entity dimension, ISTs provide better interactive representations. To jointly learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
