D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos
Christian Schmidt, Ali Athar, Sabarinath Mahadevan, Bastian Leibe

TL;DR
This paper introduces D^2Conv3D, a novel 3D convolution technique inspired by dilated and deformable convolutions, which improves video object segmentation performance and sets a new state-of-the-art on DAVIS 2016.
Contribution
The paper proposes D^2Conv3D, a new 3D convolution method that enhances video segmentation by leveraging dilated and deformable convolution principles.
Findings
D^2Conv3D improves performance across multiple video segmentation benchmarks.
D^2Conv3D outperforms simple 3D extensions of existing dilated and deformable convolutions.
Achieves state-of-the-art results on DAVIS 2016 benchmark.
Abstract
Despite receiving significant attention from the research community, the task of segmenting and tracking objects in monocular videos still has much room for improvement. Existing works have simultaneously justified the efficacy of dilated and deformable convolutions for various image-level segmentation tasks. This gives reason to believe that 3D extensions of such convolutions should also yield performance improvements for video-level segmentation tasks. However, this aspect has not yet been explored thoroughly in existing literature. In this paper, we propose Dynamic Dilated Convolutions (D^2Conv3D): a novel type of convolution which draws inspiration from dilated and deformable convolutions and extends them to the 3D (spatio-temporal) domain. We experimentally show that D^2Conv3D can be used to improve the performance of multiple 3D CNN architectures across multiple video segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
Methods3 Dimensional Convolutional Neural Network · Convolution
