Semantic Video CNNs through Representation Warping

Raghudeep Gadde; Varun Jampani; Peter V. Gehler

arXiv:1708.03088·cs.CV·August 11, 2017

Semantic Video CNNs through Representation Warping

Raghudeep Gadde, Varun Jampani, Peter V. Gehler

PDF

1 Repo 1 Video

TL;DR

This paper introduces NetWarp, a method that leverages optical flow to adapt static image CNNs for video semantic segmentation, enhancing performance with minimal additional computation.

Contribution

The work presents a novel warping module, NetWarp, enabling existing CNN architectures to effectively utilize temporal information in videos for improved segmentation.

Findings

01

Achieves state-of-the-art results on CamVid and Cityscapes datasets.

02

Improves segmentation accuracy with minimal computational overhead.

03

Demonstrates compatibility with various CNN architectures.

Abstract

In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network representations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to-end training. Experiments validate that the proposed approach incurs only little extra computational cost, while improving performance, when video streams are available. We achieve new state-of-the-art results on the CamVid and Cityscapes benchmark datasets and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raghudeep/netwarp_public
none

Videos

Semantic Video CNNs through Representation Warping· youtube