BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring   Space for Video Object Segmentation

Ye Yu; Jialin Yuan; Gaurav Mittal; Li Fuxin; and Mei Chen

arXiv:2208.01159·cs.CV·August 9, 2022·1 cites

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, and Mei Chen

PDF

Open Access

TL;DR

BATMAN introduces a novel bilateral attention transformer that leverages motion and appearance cues, along with optical flow calibration, to significantly improve semi-supervised video object segmentation performance across multiple benchmarks.

Contribution

The paper proposes a new Bilateral Attention Transformer with optical flow calibration for better segmentation of similar objects in close proximity.

Findings

01

Outperforms all existing state-of-the-art methods on four VOS benchmarks.

02

Effectively captures motion and appearance for improved segmentation.

03

Reduces noise at object boundaries through optical flow calibration.

Abstract

Video Object Segmentation (VOS) is fundamental to video understanding. Transformer-based methods show significant performance improvement on semi-supervised VOS. However, existing work faces challenges segmenting visually similar objects in close proximity of each other. In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-supervised VOS. It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries. This calibrated optical flow is then employed in our novel bilateral attention, which computes the correspondence between the query and reference frames in the neighboring bilateral space considering both motion and appearance. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Video Surveillance and Tracking Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Adam · Byte Pair Encoding · Label Smoothing · Residual Connection