# Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly   Supervised Semantic Segmentation

**Authors:** Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, Sungroh Yoon

arXiv: 1908.04501 · 2019-08-14

## TL;DR

This paper introduces a video-based approach that leverages temporal information and optical flow to improve weakly supervised semantic segmentation, achieving state-of-the-art results on PASCAL VOC 2012.

## Contribution

It proposes a novel frame-to-frame aggregation method using web videos and optical flow to enhance object localization in weakly supervised segmentation.

## Key findings

- Achieves 65.0% mIoU with VGG-16 backbone.
- Achieves 67.4% mIoU with ResNet-101 backbone.
- Outperforms existing methods under the same supervision level.

## Abstract

When a deep neural network is trained on data with only image-level labeling, the regions activated in each image tend to identify only a small region of the target object. We propose a method of using videos automatically harvested from the web to identify a larger region of the target object by using temporal information, which is not present in the static image. The temporal variations in a video allow different regions of the target object to be activated. We obtain an activated region in each frame of a video, and then aggregate the regions from successive frames into a single image, using a warping technique based on optical flow. The resulting localization maps cover more of the target object, and can then be used as proxy ground-truth to train a segmentation network. This simple approach outperforms existing methods under the same level of supervision, and even approaches relying on extra annotations. Based on VGG-16 and ResNet 101 backbones, our method achieves the mIoU of 65.0 and 67.4, respectively, on PASCAL VOC 2012 test images, which represents a new state-of-the-art.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.04501/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1908.04501/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1908.04501/full.md

---
Source: https://tomesphere.com/paper/1908.04501