3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation
Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan, Yang

TL;DR
This paper presents a robust video object segmentation method inspired by the Cutie model, demonstrating high accuracy on complex videos with occlusions, and securing third place in the CVPR 2024 MOSE challenge.
Contribution
We investigate the impact of object memory, memory frames, and input resolution on segmentation, validating our approach on a challenging dataset with state-of-the-art results.
Findings
Achieved a J&F score of 0.8139 on MOSE test set.
Validated the effectiveness of our inference method on complex occlusion videos.
Secured third place in the CVPR 2024 MOSE challenge.
Abstract
Video Object Segmentation (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames. Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance. This report validates the effectiveness of our inference method on the coMplex video Object SEgmentation (MOSE) dataset, which features complex occlusions. Our experimental results demonstrate that our approach achieves a J\&F score of 0.8139 on the test set, securing the third position in the final ranking. These findings highlight the robustness and accuracy of our method in handling challenging VOS scenarios.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Industrial Vision Systems and Defect Detection
MethodsVOS
