Learnable Sampling 3D Convolution for Video Enhancement and Action   Recognition

Shuyang Gu; Jianmin Bao; Dong Chen

arXiv:2011.10974·cs.CV·November 24, 2020

Learnable Sampling 3D Convolution for Video Enhancement and Action Recognition

Shuyang Gu, Jianmin Bao, Dong Chen

PDF

Open Access

TL;DR

This paper introduces LS3D-Conv, a learnable sampling 3D convolution module that adaptively fuses multi-level features across frames, improving robustness and performance in video enhancement and action recognition tasks.

Contribution

The paper proposes a novel LS3D-Conv module with learnable offsets for flexible, task-specific sampling in 3D convolution, enhancing existing networks for video processing.

Findings

01

Improves performance in video interpolation, super-resolution, denoising, and action recognition.

02

Demonstrates robustness over traditional correspondence-based methods.

03

Flexible replacement for standard 3D convolution layers.

Abstract

A key challenge in video enhancement and action recognition is to fuse useful information from neighboring frames. Recent works suggest establishing accurate correspondences between neighboring frames before fusing temporal information. However, the generated results heavily depend on the quality of correspondence estimation. In this paper, we propose a more robust solution: \emph{sampling and fusing multi-level features} across neighborhood frames to generate the results. Based on this idea, we introduce a new module to improve the capability of 3D convolution, namely, learnable sampling 3D convolution (\emph{LS3D-Conv}). We add learnable 2D offsets to 3D convolution which aims to sample locations on spatial feature maps across frames. The offsets can be learned for specific tasks. The \emph{LS3D-Conv} can flexibly replace 3D convolution layers in existing 3D networks and get new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Diabetic Foot Ulcer Assessment and Management

Methods3D Convolution · Convolution