TrickVOS: A Bag of Tricks for Video Object Segmentation

Evangelos Skartados; Konstantinos Georgiadis; Mehmet Kerim Yucel,; Koskinas Ioannis; Armando Domi; Anastasios Drosou; Bruno Manganelli; Albert; Saa-Garriga

arXiv:2306.15377·cs.CV·June 29, 2023

TrickVOS: A Bag of Tricks for Video Object Segmentation

Evangelos Skartados, Konstantinos Georgiadis, Mehmet Kerim Yucel,, Koskinas Ioannis, Armando Domi, Anastasios Drosou, Bruno Manganelli, Albert, Saa-Garriga

PDF

Open Access

TL;DR

TrickVOS introduces a set of practical enhancements for space-time memory networks in semi-supervised video object segmentation, improving performance, pretraining, and spatial awareness, enabling real-time mobile deployment.

Contribution

It proposes a generic bag of tricks including a hybrid loss, pretraining regime, and spatial constraints, advancing STM-based SVOS methods.

Findings

01

Achieves competitive results on DAVIS and YouTube benchmarks.

02

Enables real-time processing on mobile devices.

03

Improves segmentation accuracy and efficiency.

Abstract

Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods; i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS; a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques