D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in   Videos

Christian Schmidt; Ali Athar; Sabarinath Mahadevan; Bastian Leibe

arXiv:2111.07774·cs.CV·November 16, 2021

D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

Christian Schmidt, Ali Athar, Sabarinath Mahadevan, Bastian Leibe

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces D^2Conv3D, a novel 3D convolution technique inspired by dilated and deformable convolutions, which improves video object segmentation performance and sets a new state-of-the-art on DAVIS 2016.

Contribution

The paper proposes D^2Conv3D, a new 3D convolution method that enhances video segmentation by leveraging dilated and deformable convolution principles.

Findings

01

D^2Conv3D improves performance across multiple video segmentation benchmarks.

02

D^2Conv3D outperforms simple 3D extensions of existing dilated and deformable convolutions.

03

Achieves state-of-the-art results on DAVIS 2016 benchmark.

Abstract

Despite receiving significant attention from the research community, the task of segmenting and tracking objects in monocular videos still has much room for improvement. Existing works have simultaneously justified the efficacy of dilated and deformable convolutions for various image-level segmentation tasks. This gives reason to believe that 3D extensions of such convolutions should also yield performance improvements for video-level segmentation tasks. However, this aspect has not yet been explored thoroughly in existing literature. In this paper, we propose Dynamic Dilated Convolutions (D^2Conv3D): a novel type of convolution which draws inspiration from dilated and deformable convolutions and extends them to the 3D (spatio-temporal) domain. We experimentally show that D^2Conv3D can be used to improve the performance of multiple 3D CNN architectures across multiple video segmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

schmiddo/d2conv3d
pytorchOfficial

Videos

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

Methods3 Dimensional Convolutional Neural Network · Convolution