DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification

Yujie Yang; Shuang Li; Jun Ye; Neng Dong; Fan Li; Huafeng Li

arXiv:2511.04281·cs.CV·November 7, 2025

DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification

Yujie Yang, Shuang Li, Jun Ye, Neng Dong, Fan Li, Huafeng Li

PDF

Open Access

TL;DR

This paper introduces a novel gait representation learning framework for video-based visible-infrared person re-identification, leveraging DINOv2 priors and bidirectional multi-granularity enhancement to improve cross-modal matching accuracy.

Contribution

It proposes a DINOv2-driven gait learning framework with semantic-aware silhouette enhancement and bidirectional multi-granularity refinement, addressing limitations of appearance-only methods.

Findings

01

Outperforms state-of-the-art on HITSZ-VCM and BUPT datasets.

02

Effectively integrates gait features with appearance cues for robust cross-modal re-identification.

03

Demonstrates significant accuracy improvements over existing methods.

Abstract

Video-based Visible-Infrared person re-identification (VVI-ReID) aims to retrieve the same pedestrian across visible and infrared modalities from video sequences. Existing methods tend to exploit modality-invariant visual features but largely overlook gait features, which are not only modality-invariant but also rich in temporal dynamics, thus limiting their ability to model the spatiotemporal consistency essential for cross-modal video matching. To address these challenges, we propose a DINOv2-Driven Gait Representation Learning (DinoGRL) framework that leverages the rich visual priors of DINOv2 to learn gait features complementary to appearance cues, facilitating robust sequence-level representations for cross-modal retrieval. Specifically, we introduce a Semantic-Aware Silhouette and Gait Learning (SASGL) model, which generates and enhances silhouette representations with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGait Recognition and Analysis · Human Pose and Action Recognition · Video Surveillance and Tracking Methods