Dense Interaction Learning for Video-based Person Re-identification

Tianyu He; Xin Jin; Xu Shen; Jianqiang Huang; Zhibo Chen; Xian-Sheng; Hua

arXiv:2103.09013·cs.CV·August 18, 2021

Dense Interaction Learning for Video-based Person Re-identification

Tianyu He, Xin Jin, Xu Shen, Jianqiang Huang, Zhibo Chen, Xian-Sheng, Hua

PDF

Open Access

TL;DR

This paper introduces Dense Interaction Learning (DenseIL), a hybrid CNN and Attention-based framework that enhances video-based person re-identification by modeling multi-scale spatial-temporal features and interactions.

Contribution

The paper proposes a novel DenseIL framework combining CNN and Attention mechanisms with a Dense Interaction decoder and STEP-Emb for improved re-ID performance.

Findings

01

Outperforms state-of-the-art on multiple datasets

02

Effectively models multi-grained spatial-temporal features

03

Demonstrates significant accuracy improvements

Abstract

Video-based person re-identification (re-ID) aims at matching the same person across video clips. Efficiently exploiting multi-scale fine-grained features while building the structural interaction among them is pivotal for its success. In this paper, we propose a hybrid framework, Dense Interaction Learning (DenseIL), that takes the principal advantages of both CNN-based and Attention-based architectures to tackle video-based person re-ID difficulties. DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder. The CNN encoder is responsible for efficiently extracting discriminative spatial features while the DI decoder is designed to densely model spatial-temporal inherent interaction across frames. Different from previous works, we additionally let the DI decoder densely attends to intermediate fine-grained CNN features and that naturally yields multi-grained spatial-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Human Pose and Action Recognition

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Attention Is All You Need · Byte Pair Encoding · Label Smoothing · Dropout · Residual Connection