Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning
James Steven Supancic III, Deva Ramanan

TL;DR
This paper models object tracking as an online decision-making task, using deep reinforcement learning to learn policies that decide where to look, when to reinitialize, and when to update, enabling efficient learning from streaming videos.
Contribution
It introduces a novel POMDP formulation for tracking and applies deep reinforcement learning with sparse rewards, allowing scalable training on large streaming video datasets.
Findings
Effective learned policies for tracking decisions.
Fast training enabled by sparse rewards.
Unified evaluation on streaming Internet videos.
Abstract
We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational budget. Crucially, the agent must decide where to look in the upcoming frames, when to reinitialize because it believes the target has been lost, and when to update its appearance model for the tracked object. Such decisions are typically made heuristically. Instead, we propose to learn an optimal decision-making policy by formulating tracking as a partially observable decision-making process (POMDP). We learn policies with deep reinforcement learning algorithms that need supervision (a reward signal) only when the track has gone awry. We demonstrate that sparse rewards allow us to quickly train on massive datasets, several orders of magnitude more than past work. Interestingly, by treating the data source of Internet videos…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
