AISFormer: Amodal Instance Segmentation with Transformer

Minh Tran; Khoa Vo; Kashu Yamazaki; Arthur Fernandes; Michael Kidd,; and Ngan Le

arXiv:2210.06323·cs.CV·March 19, 2024·23 cites

AISFormer: Amodal Instance Segmentation with Transformer

Minh Tran, Khoa Vo, Kashu Yamazaki, Arthur Fernandes, Michael Kidd,, and Ngan Le

PDF

Open Access 1 Repo

TL;DR

AISFormer introduces a transformer-based framework for amodal instance segmentation that models complex mask coherence, outperforming previous CNN-based methods on multiple challenging benchmarks.

Contribution

The paper presents AISFormer, a novel transformer-based model that explicitly captures high-level feature coherence for amodal segmentation, improving over prior CNN-based approaches.

Findings

01

Achieves state-of-the-art results on KINS, D2SA, and COCOA-cls benchmarks.

02

Effectively models occluder, visible, amodal, and invisible masks with learnable queries.

03

Demonstrates significant improvements through extensive ablation studies.

Abstract

Amodal Instance Segmentation (AIS) aims to segment the region of both visible and possible occluded parts of an object instance. While Mask R-CNN-based AIS approaches have shown promising results, they are unable to model high-level features coherence due to the limited receptive field. The most recent transformer-based models show impressive performance on vision tasks, even better than Convolution Neural Networks (CNN). In this work, we present AISFormer, an AIS framework, with a Transformer-based mask head. AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries. Specifically, AISFormer contains four modules: (i) feature encoding: extract ROI and learn both short-range and long-range visual features. (ii) mask transformer decoding: generate the occluder,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uark-aicv/aisformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsConvolution