A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification
Xuehu Liu, Pingping Zhang, Chenyang Yu, Huchuan Lu and, Xuesheng Qian, Xiaoyun Yang

TL;DR
This paper introduces Trigeminal Transformers, a novel framework that jointly captures spatial, temporal, and combined features for improved video-based person re-identification, outperforming existing methods.
Contribution
The paper proposes a trigeminal feature extractor and multiple transformer modules to comprehensively model video data across different feature domains for the first time.
Findings
Achieves superior performance on public Re-ID benchmarks.
Effectively captures multi-view features for richer video representations.
Outperforms state-of-the-art methods in accuracy.
Abstract
Video-based person re-identification (Re-ID) aims to retrieve video sequences of the same person under non-overlapping cameras. Previous methods usually focus on limited views, such as spatial, temporal or spatial-temporal view, which lack of the observations in different feature domains. To capture richer perceptions and extract more comprehensive video representations, in this paper we propose a novel framework named Trigeminal Transformers (TMT) for video-based person Re-ID. More specifically, we design a trigeminal feature extractor to jointly transform raw video data into spatial, temporal and spatial-temporal domain. Besides, inspired by the great success of vision transformer, we introduce the transformer structure for video-based person Re-ID. In our work, three self-view transformers are proposed to exploit the relationships between local features for information enhancement in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Face recognition and analysis
