Describe and Attend to Track: Learning Natural Language guided Structural Representation and Visual Attention for Object Tracking
Xiao Wang, Chenglong Li, Rui Yang, Tianzhu Zhang, Jin Tang, Bin Luo

TL;DR
This paper introduces a structure-aware deep neural network for object tracking that leverages graph-based relationships among samples and integrates natural language cues to improve robustness and re-identification, especially under occlusion.
Contribution
It proposes a novel graph-based neural network that incorporates natural language guidance and visual attention mechanisms for enhanced object tracking.
Findings
Effective in handling occlusion and out-of-view scenarios
Outperforms existing methods on five benchmark datasets
Integrates natural language cues into visual tracking
Abstract
The tracking-by-detection framework requires a set of positive and negative training samples to learn robust tracking models for precise localization of target objects. However, existing tracking models mostly treat different samples independently while ignores the relationship information among them. In this paper, we propose a novel structure-aware deep neural network to overcome such limitations. In particular, we construct a graph to represent the pairwise relationships among training samples, and additionally take the natural language as the supervised information to learn both feature representations and classifiers robustly. To refine the states of the target and re-track the target when it is back to view from heavy occlusion and out of view, we elaborately design a novel subnetwork to learn the target-driven visual attentions from the guidance of both visual and natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Impact of Light on Environment and Health · Visual Attention and Saliency Detection
