End-to-end Deep Object Tracking with Circular Loss Function for Rotated Bounding Box
Vladislav Belyaev, Aleksandra Malysheva, Aleksei Shpilman

TL;DR
This paper introduces DOTCL, an end-to-end deep learning object tracking model utilizing a circular loss function that accounts for overlap and orientation, significantly improving robustness and accuracy on rotated bounding box datasets.
Contribution
The paper presents a novel Transformer-based tracking model with a circular loss function specifically designed for rotated bounding boxes, advancing the state-of-the-art in robustness and accuracy.
Findings
Outperforms current state-of-the-art models on VOT2018 dataset.
Shows significant robustness improvements over existing methods.
Achieves higher expected average overlap (EAO) metric.
Abstract
The task object tracking is vital in numerous applications such as autonomous driving, intelligent surveillance, robotics, etc. This task entails the assigning of a bounding box to an object in a video stream, given only the bounding box for that object on the first frame. In 2015, a new type of video object tracking (VOT) dataset was created that introduced rotated bounding boxes as an extension of axis-aligned ones. In this work, we introduce a novel end-to-end deep learning method based on the Transformer Multi-Head Attention architecture. We also present a new type of loss function, which takes into account the bounding box overlap and orientation. Our Deep Object Tracking model with Circular Loss Function (DOTCL) shows an considerable improvement in terms of robustness over current state-of-the-art end-to-end deep learning models. It also outperforms state-of-the-art object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Residual Connection · Adam · Attention Is All You Need · Byte Pair Encoding · Layer Normalization · Dropout
