LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS
Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, Lingling Li

TL;DR
This paper presents a combined approach using SAM2 and Cutie models for video object segmentation, achieving a high J&F score and ranking third in the LSVOS challenge, while analyzing hyperparameter effects.
Contribution
It introduces a novel combination of SOTA models SAM2 and Cutie for VOS and evaluates hyperparameter impacts on segmentation performance.
Findings
Achieved a J&F score of 0.7952 in the LSVOS challenge.
Ranked third overall in the VOS track.
Demonstrated the effectiveness of combining SAM2 and Cutie models.
Abstract
Video Object Segmentation (VOS) presents several challenges, including object occlusion and fragmentation, the dis-appearance and re-appearance of objects, and tracking specific objects within crowded scenes. In this work, we combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges. Additionally, we explore the impact of various hyperparameters on video instance segmentation performance. Our approach achieves a J\&F score of 0.7952 in the testing phase of LSVOS challenge VOS track, ranking third overall.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsVOS
