Every Frame Counts: Joint Learning of Video Segmentation and Optical   Flow

Mingyu Ding; Zhe Wang; Bolei Zhou; Jianping Shi; Zhiwu Lu; Ping Luo

arXiv:1911.12739·cs.CV·December 2, 2019·1 cites

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo

PDF

Open Access

TL;DR

This paper introduces a joint learning framework for video semantic segmentation and optical flow estimation, leveraging both tasks to improve accuracy and temporal consistency without extra inference costs.

Contribution

It presents a novel joint training approach that integrates semantic segmentation and optical flow estimation, utilizing both labeled and unlabeled frames for enhanced performance.

Findings

01

Outperforms existing methods in both segmentation and optical flow tasks.

02

Joint learning improves robustness to occlusion and temporal consistency.

03

No additional inference computation is needed.

Abstract

A major challenge for video semantic segmentation is the lack of labeled data. In most benchmark datasets, only one frame of a video clip is annotated, which makes most supervised methods fail to utilize information from the rest of the frames. To exploit the spatio-temporal information in videos, many previous works use pre-computed optical flows, which encode the temporal consistency to improve the video segmentation. However, the video segmentation and optical flow estimation are still considered as two separate tasks. In this paper, we propose a novel framework for joint video semantic segmentation and optical flow estimation. Semantic segmentation brings semantic information to handle occlusion for more robust optical flow estimation, while the non-occluded optical flow provides accurate pixel-level temporal correspondences to guarantee the temporal consistency of the segmentation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Advanced Image and Video Retrieval Techniques