S3-CLIP: Video Super Resolution for Person-ReID

Tamas Endrei; Gyorgy Cserey

arXiv:2601.08807·cs.CV·January 14, 2026

S3-CLIP: Video Super Resolution for Person-ReID

Tamas Endrei, Gyorgy Cserey

PDF

Open Access

TL;DR

S3-CLIP introduces a novel video super-resolution framework for person re-identification, significantly improving tracklet quality and ranking accuracy in challenging cross-view scenarios, marking the first systematic exploration of super-resolution in this context.

Contribution

This work is the first to systematically investigate video super-resolution as a means to enhance tracklet quality for person ReID in challenging scenarios.

Findings

01

Achieves 37.52% mAP in aerial-to-ground scenarios.

02

Improves Rank-1 accuracy by 11.24% in ground-to-aerial scenarios.

03

Demonstrates competitive performance with baseline methods.

Abstract

Tracklet quality is often treated as an afterthought in most person re-identification (ReID) methods, with the majority of research presenting architectural modifications to foundational models. Such approaches neglect an important limitation, posing challenges when deploying ReID systems in real-world, difficult scenarios. In this paper, we introduce S3-CLIP, a video super-resolution-based CLIP-ReID framework developed for the VReID-XFD challenge at WACV 2026. The proposed method integrates recent advances in super-resolution networks with task-driven super-resolution pipelines, adapting them to the video-based person re-identification setting. To the best of our knowledge, this work represents the first systematic investigation of video super-resolution as a means of enhancing tracklet quality for person ReID, particularly under challenging cross-view conditions. Experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Image Processing Techniques · Advanced Neural Network Applications