Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking

Cheng-Yen Yang; Hsiang-Wei Huang; Pyong-Kun Kim; Chien-Kai Kuo; Jui-Wei Chang; Kwang-Ju Kim; Chung-I Huang; Jenq-Neng Hwang

arXiv:2505.18111·cs.CV·May 26, 2025

Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking

Cheng-Yen Yang, Hsiang-Wei Huang, Pyong-Kun Kim, Chien-Kai Kuo, Jui-Wei Chang, Kwang-Ju Kim, Chung-I Huang, Jenq-Neng Hwang

PDF

TL;DR

This paper adapts the Segment Anything Model 2 (SAM2) for visual object tracking, achieving top performance in a multi-modal tracking challenge by integrating optimizations and demonstrating its effectiveness.

Contribution

The paper introduces a novel adaptation of SAM2 for VOT, incorporating specific enhancements to improve tracking performance in multi-modal datasets.

Findings

01

Achieved first place with an AUC score of 89.4 in the ICPR 2024 challenge.

02

Demonstrated the effectiveness of SAM2 adaptation for multi-modal visual tracking.

03

Provided comprehensive analysis of the proposed method's performance.

Abstract

We present an effective approach for adapting the Segment Anything Model 2 (SAM2) to the Visual Object Tracking (VOT) task. Our method leverages the powerful pre-trained capabilities of SAM2 and incorporates several key techniques to enhance its performance in VOT applications. By combining SAM2 with our proposed optimizations, we achieved a first place AUC score of 89.4 on the 2024 ICPR Multi-modal Object Tracking challenge, demonstrating the effectiveness of our approach. This paper details our methodology, the specific enhancements made to SAM2, and a comprehensive analysis of our results in the context of VOT solutions along with the multi-modality aspect of the dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.