Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
Zhaoyuan Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanling Zhang,, Shenghua Gao

TL;DR
This paper introduces a deep reinforcement learning-based framework for selecting the most valuable frames in interactive video object segmentation, improving practicality and performance without altering existing algorithms.
Contribution
It formulates frame selection as a Markov Decision Process and trains an agent to automatically identify the most beneficial frames for annotation in the wild.
Findings
The learned agent effectively improves segmentation performance.
The approach outperforms traditional worst-metric-based frame selection.
No changes needed to existing VOS algorithms.
Abstract
This paper proposes a framework for the interactive video object segmentation (VOS) in the wild where users can choose some frames for annotations iteratively. Then, based on the user annotations, a segmentation algorithm refines the masks. The previous interactive VOS paradigm selects the frame with some worst evaluation metric, and the ground truth is required for calculating the evaluation metric, which is impractical in the testing phase. In contrast, in this paper, we advocate that the frame with the worst evaluation metric may not be exactly the most valuable frame that leads to the most performance improvement across the video. Thus, we formulate the frame selection problem in the interactive VOS as a Markov Decision Process, where an agent is learned to recommend the frame under a deep reinforcement learning framework. The learned agent can automatically determine the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Image and Video Quality Assessment
MethodsVOS
