Depth-Adaptive Computational Policies for Efficient Visual Tracking

Chris Ying; Katerina Fragkiadaki

arXiv:1801.00508·cs.CV·January 3, 2018

Depth-Adaptive Computational Policies for Efficient Visual Tracking

Chris Ying, Katerina Fragkiadaki

PDF

Open Access

TL;DR

This paper introduces a depth-adaptive convolutional Siamese network for video object tracking that dynamically adjusts computation depth, balancing accuracy and efficiency, and outperforming fixed-structure networks in cost-accuracy trade-offs.

Contribution

It proposes a novel depth-adaptive neural network with parametric gating for efficient video tracking, enabling dynamic computation depth control based on scene complexity.

Findings

01

Achieves state-of-the-art accuracy on VOT2016 benchmark.

02

Provides higher accuracy at lower computational costs compared to fixed-structure networks.

03

Extends to other CNN-based tasks for runtime speed-accuracy trade-offs.

Abstract

Current convolutional neural networks algorithms for video object tracking spend the same amount of computation for each object and video frame. However, it is harder to track an object in some frames than others, due to the varying amount of clutter, scene complexity, amount of motion, and object's distinctiveness against its background. We propose a depth-adaptive convolutional Siamese network that performs video tracking adaptively at multiple neural network depths. Parametric gating functions are trained to control the depth of the convolutional feature extractor by minimizing a joint loss of computational cost and tracking error. Our network achieves accuracy comparable to the state-of-the-art on the VOT2016 benchmark. Furthermore, our adaptive depth computation achieves higher accuracy for a given computational cost than traditional fixed-structure neural networks. The presented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Image Enhancement Techniques · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Siamese Network