VILLS -- Video-Image Learning to Learn Semantics for Person   Re-Identification

Siyuan Huang; Ram Prabhakar; Yuxiang Guo; Rama Chellappa; Cheng Peng

arXiv:2311.17074·cs.CV·October 28, 2024·1 cites

VILLS -- Video-Image Learning to Learn Semantics for Person Re-Identification

Siyuan Huang, Ram Prabhakar, Yuxiang Guo, Rama Chellappa, Cheng Peng

PDF

Open Access

TL;DR

VILLS is a self-supervised approach that jointly learns spatial and temporal features from images and videos to improve person re-identification robustness in challenging real-world scenarios.

Contribution

It introduces a novel unified framework with semantic extraction and feature adaptation modules, achieving state-of-the-art results in person re-identification.

Findings

01

VILLS outperforms existing methods significantly.

02

The method effectively combines image and video modalities.

03

It demonstrates robustness in real-world, unconstrained environments.

Abstract

Person Re-identification is a research area with significant real world applications. Despite recent progress, existing methods face challenges in robust re-identification in the wild, e.g., by focusing only on a particular modality and on unreliable patterns such as clothing. A generalized method is highly desired, but remains elusive to achieve due to issues such as the trade-off between spatial and temporal resolution and imperfect feature extraction. We propose VILLS (Video-Image Learning to Learn Semantics), a self-supervised method that jointly learns spatial and temporal features from images and videos. VILLS first designs a local semantic extraction module that adaptively extracts semantically consistent and robust spatial features. Then, VILLS designs a unified feature learning and adaptation module to represent image and video modalities in a consistent feature space. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Gait Recognition and Analysis

MethodsFocus