Adaptive ROI Generation for Video Object Segmentation Using   Reinforcement Learning

Mingjie Sun; Jimin Xiao; Eng Gee Lim; Yanchu Xie; Jiashi Feng

arXiv:1909.12482·cs.CV·September 30, 2019

Adaptive ROI Generation for Video Object Segmentation Using Reinforcement Learning

Mingjie Sun, Jimin Xiao, Eng Gee Lim, Yanchu Xie, Jiashi Feng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reinforcement learning-based method for adaptive region of interest selection in semi-supervised video object segmentation, significantly improving accuracy and speed over existing approaches.

Contribution

It proposes a novel RL framework with a multi-branch tree exploration method for optimal ROI selection, enhancing online model adaptation in video segmentation.

Findings

01

Achieves 87.1% mean region similarity on DAVIS 2016

02

Outperforms state-of-the-art methods in segmentation accuracy

03

Speeds up model adaptation process

Abstract

In this paper, we aim to tackle the task of semi-supervised video object segmentation across a sequence of frames where only the ground-truth segmentation of the first frame is provided. The challenges lie in how to online update the segmentation model initialized from the first frame adaptively and accurately, even in presence of multiple confusing instances or large object motion. The existing approaches rely on selecting the region of interest for model update, which however, is rough and inflexible, leading to performance degradation. To overcome this limitation, we propose a novel approach which utilizes reinforcement learning to select optimal adaptation areas for each frame, based on the historical segmentation information. The RL model learns to take optimal actions to adjust the region of interest inferred from the previous frame for online model updating. To speed up the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

insomnia94/ARG
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Advanced Neural Network Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings