UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Jiannan Wu; Yi Jiang; Bin Yan; Huchuan Lu; Zehuan Yuan; Ping Luo

arXiv:2312.15715·cs.CV·December 27, 2023·1 cites

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo

PDF

Open Access 2 Repos

TL;DR

UniRef++ introduces a unified architecture for multiple reference-based object segmentation tasks, enabling multi-task learning and achieving state-of-the-art results across various benchmarks.

Contribution

The paper proposes UniRef++, a single architecture with UniFusion for unified multi-task object segmentation, allowing flexible task execution and efficient training.

Findings

01

Achieves state-of-the-art on RIS and RVOS benchmarks.

02

Performs competitively on FSS and VOS with shared parameters.

03

Incorporates UniFusion into foundation models like SAM for efficient finetuning.

Abstract

The reference-based object segmentation tasks, namely referring image segmentation (RIS), few-shot image segmentation (FSS), referring video object segmentation (RVOS), and video object segmentation (VOS), aim to segment a specific object by utilizing either language or annotated masks as references. Despite significant progress in each respective field, current methods are task-specifically designed and developed in different directions, which hinders the activation of multi-task capabilities for these tasks. In this work, we end the current fragmented situation and propose UniRef++ to unify the four reference-based object segmentation tasks with a single architecture. At the heart of our approach is the proposed UniFusion module which performs multiway-fusion for handling different tasks with respect to their specified references. And a unified Transformer architecture is then adopted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Position-Wise Feed-Forward Layer · VOS · Absolute Position Encodings · Dropout · Layer Normalization · Residual Connection