ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised   Video Object Segmentation

Jiahao Li; Yuanyou Xu; Zongxin Yang; Yi Yang; Yueting Zhuang

arXiv:2307.02010·cs.CV·July 11, 2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang

PDF

Open Access

TL;DR

This paper introduces MSDeAOT, a multi-scale transformer-based framework with hierarchical propagation modules, achieving top performance in semi-supervised video object segmentation for the EPIC-KITCHEN Challenge 2023.

Contribution

MSDeAOT extends the AOT framework by integrating multi-scale transformers and hierarchical Gated Propagation Modules for improved segmentation accuracy.

Findings

01

Achieved top ranking in EPIC-KITCHEN VISOR challenge.

02

Enhanced small object detection with refined feature scale propagation.

03

Utilized test-time augmentation and ensemble methods for performance boost.

Abstract

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation. In this study, we introduce MSDeAOT, a variant of the AOT series that incorporates transformers at multiple feature scales. Leveraging the hierarchical Gated Propagation Module (GPM), MSDeAOT efficiently propagates object masks from previous frames to the current frame using a feature scale with a stride of 16. Additionally, we employ GPM in a more refined feature scale with a stride of 8, leading to improved accuracy in detecting and tracking small objects. Through the implementation of test-time augmentations and model ensemble techniques, we achieve the top-ranking position in the EPIC-KITCHEN VISOR Semi-supervised Video Object Segmentation Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques