VSD-MOT: End-to-End Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Distillation

Jun Du

arXiv:2603.20731·cs.CV·March 24, 2026

VSD-MOT: End-to-End Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Distillation

Jun Du

PDF

Open Access

TL;DR

This paper introduces VSD-MOT, a novel multi-object tracking framework that leverages visual semantic distillation and adaptive weighting to improve tracking accuracy in low-quality videos while maintaining performance in standard scenarios.

Contribution

The paper proposes a new end-to-end multi-object tracking method guided by visual semantic distillation, incorporating a knowledge distillation framework and dynamic weight regulation for low-quality videos.

Findings

01

Significantly improves tracking accuracy in low-quality videos

02

Maintains strong performance in conventional video scenarios

03

Demonstrates effectiveness through extensive experiments

Abstract

Existing multi-object tracking algorithms typically fail to adequately address the issues in low-quality videos, resulting in a significant decline in tracking performance when image quality deteriorates in real-world scenarios. This performance degradation is primarily due to the algorithms' inability to effectively tackle the problems caused by information loss in low-quality images. To address the challenges of low-quality video scenarios, inspired by vision-language models, we propose a multi-object tracking framework guided by visual semantic distillation (VSD-MOT). Specifically, we introduce the CLIP Image Encoder to extract global visual semantic information from images to compensate for the loss of information in low-quality images. However, direct integration can substantially impact the efficiency of the multi-object tracking algorithm. Therefore, this paper proposes to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Image and Video Quality Assessment · Visual Attention and Saliency Detection