GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates
Xingyu Luo, Yidong Cai, Jie Liu, Jie Tang, Gangshan Wu, Limin Wang

TL;DR
GLAD introduces a generative language-assisted visual tracking method using diffusion models to improve the fusion of visual and linguistic information, especially for low-semantic templates, achieving state-of-the-art results.
Contribution
This paper presents a novel diffusion-based generative fusion approach for vision-language tracking, addressing limitations of existing methods with low-semantic images.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Effectively restores blurry and ambiguous template images.
Maintains impressive inference speed.
Abstract
Vision-language tracking has gained increasing attention in many scenarios. This task simultaneously deals with visual and linguistic information to localize objects in videos. Despite its growing utility, the development of vision-language tracking methods remains in its early stage. Current vision-language trackers usually employ Transformer architectures for interactive integration of template, search, and text features. However, persistent challenges about low-semantic images including prevalent image blurriness, low resolution and so on, may compromise model performance through degraded cross-modal understanding. To solve this problem, language assistance is usually used to deal with the obstacles posed by low-semantic images. However, due to the existing gap between current textual and visual features, direct concatenation and fusion of these features may have limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Face recognition and analysis
