LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition
Naga Venkata Sai Raviteja Chappa, Khoa Luu

TL;DR
LiGAR introduces a hierarchical transformer model that effectively integrates LiDAR, visual, and textual data for improved multi-modal group activity recognition, demonstrating state-of-the-art results across multiple datasets.
Contribution
The paper presents a novel LiDAR-guided hierarchical transformer architecture that enhances multi-modal group activity recognition by capturing multi-scale spatial and semantic information.
Findings
Achieves up to 10.6% improvement in F1-score on JRDB-PAR
Improves Mean Per Class Accuracy by 5.9% on NBA dataset
Maintains high performance without LiDAR data during inference
Abstract
Group Activity Recognition (GAR) remains challenging in computer vision due to the complex nature of multi-agent interactions. This paper introduces LiGAR, a LIDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition. LiGAR leverages LiDAR data as a structural backbone to guide the processing of visual and textual information, enabling robust handling of occlusions and complex spatial arrangements. Our framework incorporates a Multi-Scale LIDAR Transformer, Cross-Modal Guided Attention, and an Adaptive Fusion Module to integrate multi-modal data at different semantic levels effectively. LiGAR's hierarchical architecture captures group activities at various granularities, from individual actions to scene-level dynamics. Extensive experiments on the JRDB-PAR, Volleyball, and NBA datasets demonstrate LiGAR's superior performance, achieving state-of-the-art results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Multi-Head Attention · Softmax · Adam
