DensSiam: End-to-End Densely-Siamese Network with Self-Attention Model for Object Tracking
Mohamed H. Abdelpakey, Mohamed S. Shehata, and Mostafa M. Mohamed

TL;DR
DensSiam introduces a densely connected Siamese network with self-attention for object tracking, improving robustness, accuracy, and efficiency by capturing non-local features and reducing shared parameters.
Contribution
The paper proposes DensSiam, a novel deep Siamese architecture with dense layers and self-attention, enhancing tracking performance and generalization over existing models.
Findings
Achieves superior accuracy on multiple benchmarks
Maintains real-time processing speed
Improves robustness to appearance changes
Abstract
Convolutional Siamese neural networks have been recently used to track objects using deep features. Siamese architecture can achieve real time speed, however it is still difficult to find a Siamese architecture that maintains the generalization capability, high accuracy and speed while decreasing the number of shared parameters especially when it is very deep. Furthermore, a conventional Siamese architecture usually processes one local neighborhood at a time, which makes the appearance model local and non-robust to appearance changes. To overcome these two problems, this paper proposes DensSiam, a novel convolutional Siamese architecture, which uses the concept of dense layers and connects each dense layer to all layers in a feed-forward fashion with a similarity-learning function. DensSiam also includes a Self-Attention mechanism to force the network to pay more attention to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Fire Detection and Safety Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
