MSGL-Transformer: A Multi-Scale Global-Local Transformer for Rodent Social Behavior Recognition
Muhammad Imran Sharif, Doina Caragea

TL;DR
The paper introduces MSGL-Transformer, a multi-scale global-local transformer model that effectively recognizes rodent social behaviors from pose sequences, outperforming existing methods across multiple datasets.
Contribution
It proposes a novel multi-scale attention transformer architecture with a behavior-aware modulation block, achieving superior accuracy in rodent behavior recognition tasks.
Findings
Achieves 75.4% accuracy on RatSI dataset, outperforming TCN, LSTM, and Bi-LSTM.
Achieves 87.1% accuracy on CalMS21 dataset, improving over HSTWFormer and other models.
The architecture generalizes across datasets with minimal adjustments.
Abstract
Recognition of rodent behavior is important for understanding neural and behavioral mechanisms. Traditional manual scoring is time-consuming and prone to human error. We propose MSGL-Transformer, a Multi-Scale Global-Local Transformer for recognizing rodent social behaviors from pose-based temporal sequences. The model employs a lightweight transformer encoder with multi-scale attention to capture motion dynamics across different temporal scales. The architecture integrates parallel short-range, medium-range, and global attention branches to explicitly capture behavior dynamics at multiple temporal scales. We also introduce a Behavior-Aware Modulation (BAM) block, inspired by SE-Networks, which modulates temporal embeddings to emphasize behavior-relevant features prior to attention. We evaluate on two datasets: RatSI (5 behavior classes, 12D pose inputs) and CalMS21 (4 behavior classes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
