DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification

Ying Shu; Pujian Zhan; Huiqi Yang; Hehe Fan; Youfang Lin; Kai Lv

arXiv:2602.01059·cs.CV·February 3, 2026

DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification

Ying Shu, Pujian Zhan, Huiqi Yang, Hehe Fan, Youfang Lin, Kai Lv

PDF

Open Access

TL;DR

DRFormer integrates local texture and global semantic features using a dual-regularized bidirectional transformer, effectively improving person re-identification performance by leveraging the complementary strengths of vision foundation and vision-language models.

Contribution

The paper introduces DRFormer, a novel framework that synergizes vision foundation and vision-language models through dual-regularization for enhanced person re-identification.

Findings

01

Achieves competitive results on five benchmarks.

02

Effectively balances local and global feature contributions.

03

Demonstrates the benefit of integrating different model paradigms.

Abstract

Both fine-grained discriminative details and global semantic features can contribute to solving person re-identification challenges, such as occlusion and pose variations. Vision foundation models (\textit{e.g.}, DINO) excel at mining local textures, and vision-language models (\textit{e.g.}, CLIP) capture strong global semantic difference. Existing methods predominantly rely on a single paradigm, neglecting the potential benefits of their integration. In this paper, we analyze the complementary roles of these two architectures and propose a framework to synergize their strengths by a \textbf{D}ual-\textbf{R}egularized Bidirectional \textbf{Transformer} (\textbf{DRFormer}). The dual-regularization mechanism ensures diverse feature extraction and achieves a better balance in the contributions of the two models. Extensive experiments on five benchmarks show that our method effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gait Recognition and Analysis · Advanced Neural Network Applications