TL;DR
This paper introduces TridentAlign and context embedding modules to improve Siamese network-based visual tracking, effectively handling scale variations and distractors while maintaining real-time performance.
Contribution
The paper presents novel modules that enhance scale adaptability and distractor discrimination in Siamese tracking networks, a significant advancement over prior methods.
Findings
Achieves performance comparable to state-of-the-art trackers.
Operates at real-time speed.
Effectively handles large scale variations and distractors.
Abstract
Recent advances in Siamese network-based visual tracking methods have enabled high performance on numerous tracking benchmarks. However, extensive scale variations of the target object and distractor objects with similar categories have consistently posed challenges in visual tracking. To address these persisting issues, we propose novel TridentAlign and context embedding modules for Siamese network-based visual tracking methods. The TridentAlign module facilitates adaptability to extensive scale variations and large deformations of the target, where it pools the feature representation of the target object into multiple spatial dimensions to form a feature pyramid, which is then utilized in the region proposal stage. Meanwhile, context embedding module aims to discriminate the target from distractor objects by accounting for the global context information among objects. The context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
