VSD-MOT: End-to-End Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Distillation
Jun Du

TL;DR
This paper introduces VSD-MOT, a novel multi-object tracking framework that leverages visual semantic distillation and adaptive weighting to improve tracking accuracy in low-quality videos while maintaining performance in standard scenarios.
Contribution
The paper proposes a new end-to-end multi-object tracking method guided by visual semantic distillation, incorporating a knowledge distillation framework and dynamic weight regulation for low-quality videos.
Findings
Significantly improves tracking accuracy in low-quality videos
Maintains strong performance in conventional video scenarios
Demonstrates effectiveness through extensive experiments
Abstract
Existing multi-object tracking algorithms typically fail to adequately address the issues in low-quality videos, resulting in a significant decline in tracking performance when image quality deteriorates in real-world scenarios. This performance degradation is primarily due to the algorithms' inability to effectively tackle the problems caused by information loss in low-quality images. To address the challenges of low-quality video scenarios, inspired by vision-language models, we propose a multi-object tracking framework guided by visual semantic distillation (VSD-MOT). Specifically, we introduce the CLIP Image Encoder to extract global visual semantic information from images to compensate for the loss of information in low-quality images. However, direct integration can substantially impact the efficiency of the multi-object tracking algorithm. Therefore, this paper proposes to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Image and Video Quality Assessment · Visual Attention and Saliency Detection
