ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single   Object Tracking

Yuanyou Xu; Jiahao Li; Zongxin Yang; Yi Yang; Yueting Zhuang

arXiv:2307.02508·cs.CV·July 11, 2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang

PDF

Open Access

TL;DR

This paper presents a novel approach for object tracking in videos by converting bounding boxes to masks using segmentation models and propagating them with a multi-scale transformer, achieving top results in a challenge.

Contribution

The study introduces MSDeAOT, a multi-scale transformer-based method that enhances object mask propagation for video tracking and segmentation.

Findings

01

Achieved 1st place in EPIC-KITCHENS TREK-150 challenge.

02

Converted bounding boxes to masks for improved tracking.

03

Demonstrated effectiveness of multi-scale transformers in video object tracking.

Abstract

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object tracking and segmentation. In this study, we convert the bounding boxes to masks in reference frames with the help of the Segment Anything Model (SAM) and Alpha-Refine, and then propagate the masks to the current frame, transforming the task from Video Object Tracking (VOT) to video object segmentation (VOS). Furthermore, we introduce MSDeAOT, a variant of the AOT series that incorporates transformers at multiple feature scales. MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8. As a testament to the effectiveness of our design, we achieved the 1st place in the EPIC-KITCHENS TREK-150 Object Tracking Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Advanced Neural Network Applications