M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

Zixuan Chen; Jiaxin Li; Liming Tan; Yejie Guo; Junxuan Liang; Cewu Lu; Yong-Lu Li

arXiv:2412.13803·cs.CV·June 3, 2025

M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

Zixuan Chen, Jiaxin Li, Liming Tan, Yejie Guo, Junxuan Liang, Cewu Lu, Yong-Lu Li

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces M$^3$-VOS, a comprehensive benchmark for video object segmentation that accounts for object phase transitions, and proposes ReVOS, a model that enhances segmentation by reversing entropy processes.

Contribution

The paper presents a new benchmark dataset for phase-aware video object segmentation and introduces ReVOS, a novel model that improves segmentation performance through entropy reversal refinement.

Findings

01

Current methods struggle with phase transitions in objects.

02

ReVOS outperforms existing approaches by reversing entropy processes.

03

The benchmark includes 479 videos across diverse scenarios.

Abstract

Intelligent robots need to interact with diverse objects across various environments. The appearance and state of objects frequently undergo complex transformations depending on the object properties, e.g., phase transitions. However, in the vision community, segmenting dynamic objects with phase transitions is overlooked. In light of this, we introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes. Then, we present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M $^{3}$ -VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. It provides dense instance mask annotations that capture both object phases and their transitions. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zixuan-chen/M3VOS_Experiment
pytorch

Datasets

Lijiaxin0111/M3_VOS
dataset· 737 dl
737 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques