VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement

Hanjung Kim; Jaehyun Kang; Miran Heo; Sukjun Hwang; Seoung Wug Oh,; Seon Joo Kim

arXiv:2312.04885·cs.CV·March 11, 2024·1 cites

VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement

Hanjung Kim, Jaehyun Kang, Miran Heo, Sukjun Hwang, Seoung Wug Oh,, Seon Joo Kim

PDF

Open Access 1 Repo

TL;DR

VISAGE introduces appearance-guided enhancements to improve object association in video instance segmentation, addressing the over-reliance on location cues and achieving state-of-the-art results on multiple benchmarks.

Contribution

The paper proposes a novel extension to object decoders that explicitly captures appearance features, significantly improving tracking accuracy in VIS tasks.

Findings

01

Achieves state-of-the-art results on YouTube-VIS 2019/2021 and OVIS datasets.

02

Constructs a synthetic dataset to evaluate appearance awareness.

03

Enhances object association by explicitly modeling appearance features.

Abstract

In recent years, online Video Instance Segmentation (VIS) methods have shown remarkable advancement with their powerful query-based detectors. Utilizing the output queries of the detector at the frame-level, these methods achieve high accuracy on challenging benchmarks. However, our observations demonstrate that these methods heavily rely on location information, which often causes incorrect associations between objects. This paper presents that a key axis of object matching in trackers is appearance information, which becomes greatly instructive under conditions where positional cues are insufficient for distinguishing their identities. Therefore, we suggest a simple yet powerful extension to object decoders that explicitly extract embeddings from backbone features and drive queries to capture the appearances of objects, which greatly enhances instance association accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kimhanjung/visage
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods