LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS

Xinyu Liu; Jing Zhang; Kexin Zhang; Xu Liu; Lingling Li

arXiv:2408.10469·cs.CV·August 22, 2024

LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS

Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, Lingling Li

PDF

Open Access

TL;DR

This paper presents a combined approach using SAM2 and Cutie models for video object segmentation, achieving a high J&F score and ranking third in the LSVOS challenge, while analyzing hyperparameter effects.

Contribution

It introduces a novel combination of SOTA models SAM2 and Cutie for VOS and evaluates hyperparameter impacts on segmentation performance.

Findings

01

Achieved a J&F score of 0.7952 in the LSVOS challenge.

02

Ranked third overall in the VOS track.

03

Demonstrated the effectiveness of combining SAM2 and Cutie models.

Abstract

Video Object Segmentation (VOS) presents several challenges, including object occlusion and fragmentation, the dis-appearance and re-appearance of objects, and tracking specific objects within crowded scenes. In this work, we combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges. Additionally, we explore the impact of various hyperparameters on video instance segmentation performance. Our approach achieves a J\&F score of 0.7952 in the testing phase of LSVOS challenge VOS track, ranking third overall.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsVOS