Dense Interaction Learning for Video-based Person Re-identification
Tianyu He, Xin Jin, Xu Shen, Jianqiang Huang, Zhibo Chen, Xian-Sheng, Hua

TL;DR
This paper introduces Dense Interaction Learning (DenseIL), a hybrid CNN and Attention-based framework that enhances video-based person re-identification by modeling multi-scale spatial-temporal features and interactions.
Contribution
The paper proposes a novel DenseIL framework combining CNN and Attention mechanisms with a Dense Interaction decoder and STEP-Emb for improved re-ID performance.
Findings
Outperforms state-of-the-art on multiple datasets
Effectively models multi-grained spatial-temporal features
Demonstrates significant accuracy improvements
Abstract
Video-based person re-identification (re-ID) aims at matching the same person across video clips. Efficiently exploiting multi-scale fine-grained features while building the structural interaction among them is pivotal for its success. In this paper, we propose a hybrid framework, Dense Interaction Learning (DenseIL), that takes the principal advantages of both CNN-based and Attention-based architectures to tackle video-based person re-ID difficulties. DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder. The CNN encoder is responsible for efficiently extracting discriminative spatial features while the DI decoder is designed to densely model spatial-temporal inherent interaction across frames. Different from previous works, we additionally let the DI decoder densely attends to intermediate fine-grained CNN features and that naturally yields multi-grained spatial-temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Human Pose and Action Recognition
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Attention Is All You Need · Byte Pair Encoding · Label Smoothing · Dropout · Residual Connection
