RSINet: Rotation-Scale Invariant Network for Online Visual Tracking
Yang Fang, Geun-Sik Jo, Chang-Hee Lee

TL;DR
RSINet is a real-time visual tracking network that explicitly learns rotation and scale variations, adaptively updates its model, and achieves state-of-the-art accuracy on multiple benchmarks.
Contribution
The paper introduces RSINet, a novel tracker with explicit rotation-scale estimation and adaptive model updating, improving accuracy and robustness over existing Siamese-based trackers.
Findings
Achieves state-of-the-art performance on OTB-100, VOT2018, and LaSOT benchmarks.
Runs at approximately 45 FPS in real-time.
Effectively estimates rotation and scale transformations during tracking.
Abstract
Most Siamese network-based trackers perform the tracking process without model update, and cannot learn targetspecific variation adaptively. Moreover, Siamese-based trackers infer the new state of tracked objects by generating axis-aligned bounding boxes, which contain extra background noise, and are unable to accurately estimate the rotation and scale transformation of moving objects, thus potentially reducing tracking performance. In this paper, we propose a novel Rotation-Scale Invariant Network (RSINet) to address the above problem. Our RSINet tracker consists of a target-distractor discrimination branch and a rotation-scale estimation branch, the rotation and scale knowledge can be explicitly learned by a multi-task learning method in an end-to-end manner. In addtion, the tracking model is adaptively optimized and updated under spatio-temporal energy control, which ensures model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Video Analysis and Summarization
