Adaptive ROI Generation for Video Object Segmentation Using Reinforcement Learning
Mingjie Sun, Jimin Xiao, Eng Gee Lim, Yanchu Xie, Jiashi Feng

TL;DR
This paper introduces a reinforcement learning-based method for adaptive region of interest selection in semi-supervised video object segmentation, significantly improving accuracy and speed over existing approaches.
Contribution
It proposes a novel RL framework with a multi-branch tree exploration method for optimal ROI selection, enhancing online model adaptation in video segmentation.
Findings
Achieves 87.1% mean region similarity on DAVIS 2016
Outperforms state-of-the-art methods in segmentation accuracy
Speeds up model adaptation process
Abstract
In this paper, we aim to tackle the task of semi-supervised video object segmentation across a sequence of frames where only the ground-truth segmentation of the first frame is provided. The challenges lie in how to online update the segmentation model initialized from the first frame adaptively and accurately, even in presence of multiple confusing instances or large object motion. The existing approaches rely on selecting the region of interest for model update, which however, is rough and inflexible, leading to performance degradation. To overcome this limitation, we propose a novel approach which utilizes reinforcement learning to select optimal adaptation areas for each frame, based on the historical segmentation information. The RL model learns to take optimal actions to adjust the region of interest inferred from the previous frame for online model updating. To speed up the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
