M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
Zixuan Chen, Jiaxin Li, Liming Tan, Yejie Guo, Junxuan Liang, Cewu Lu, Yong-Lu Li

TL;DR
This paper introduces M$^3$-VOS, a comprehensive benchmark for video object segmentation that accounts for object phase transitions, and proposes ReVOS, a model that enhances segmentation by reversing entropy processes.
Contribution
The paper presents a new benchmark dataset for phase-aware video object segmentation and introduces ReVOS, a novel model that improves segmentation performance through entropy reversal refinement.
Findings
Current methods struggle with phase transitions in objects.
ReVOS outperforms existing approaches by reversing entropy processes.
The benchmark includes 479 videos across diverse scenarios.
Abstract
Intelligent robots need to interact with diverse objects across various environments. The appearance and state of objects frequently undergo complex transformations depending on the object properties, e.g., phase transitions. However, in the vision community, segmenting dynamic objects with phase transitions is overlooked. In light of this, we introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes. Then, we present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. It provides dense instance mask annotations that capture both object phases and their transitions. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
