Learning to Recommend Frame for Interactive Video Object Segmentation in   the Wild

Zhaoyuan Yin; Jia Zheng; Weixin Luo; Shenhan Qian; Hanling Zhang,; Shenghua Gao

arXiv:2103.10391·cs.CV·June 18, 2021

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

Zhaoyuan Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanling Zhang,, Shenghua Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deep reinforcement learning-based framework for selecting the most valuable frames in interactive video object segmentation, improving practicality and performance without altering existing algorithms.

Contribution

It formulates frame selection as a Markov Decision Process and trains an agent to automatically identify the most beneficial frames for annotation in the wild.

Findings

01

The learned agent effectively improves segmentation performance.

02

The approach outperforms traditional worst-metric-based frame selection.

03

No changes needed to existing VOS algorithms.

Abstract

This paper proposes a framework for the interactive video object segmentation (VOS) in the wild where users can choose some frames for annotations iteratively. Then, based on the user annotations, a segmentation algorithm refines the masks. The previous interactive VOS paradigm selects the frame with some worst evaluation metric, and the ground truth is required for calculating the evaluation metric, which is impractical in the testing phase. In contrast, in this paper, we advocate that the frame with the worst evaluation metric may not be exactly the most valuable frame that leads to the most performance improvement across the video. Thus, we formulate the frame selection problem in the interactive VOS as a Markov Decision Process, where an agent is learned to recommend the frame under a deep reinforcement learning framework. The learned agent can automatically determine the most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

svip-lab/IVOS-W
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Image and Video Quality Assessment

MethodsVOS