# Align-and-Attend Network for Globally and Locally Coherent Video   Inpainting

**Authors:** Sanghyun Woo, Dahun Kim, KwanYong Park, Joon-Young Lee, In So Kweon

arXiv: 1905.13066 · 2019-05-31

## TL;DR

This paper introduces a novel feed-forward video inpainting network that combines alignment and non-local attention modules to achieve globally and locally coherent results, effectively handling large or slowly moving holes.

## Contribution

The proposed network uniquely integrates alignment and non-local attention with recurrent propagation for improved temporal and spatial coherence in video inpainting.

## Key findings

- Effective inpainting of large or slowly moving holes
- Outperforms existing flow-based methods in coherence
- Maintains temporal consistency in results

## Abstract

We propose a novel feed-forward network for video inpainting. We use a set of sampled video frames as the reference to take visible contents to fill the hole of a target frame. Our video inpainting network consists of two stages. The first stage is an alignment module that uses computed homographies between the reference frames and the target frame. The visible patches are then aggregated based on the frame similarity to fill in the target holes roughly. The second stage is a non-local attention module that matches the generated patches with known reference patches (in space and time) to refine the previous global alignment stage. Both stages consist of large spatial-temporal window size for the reference and thus enable modeling long-range correlations between distant information and the hole regions. Therefore, even challenging scenes with large or slowly moving holes can be handled, which have been hardly modeled by existing flow-based approach. Our network is also designed with a recurrent propagation stream to encourage temporal consistency in video results. Experiments on video object removal demonstrate that our method inpaints the holes with globally and locally coherent contents.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.13066/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1905.13066/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1905.13066/full.md

---
Source: https://tomesphere.com/paper/1905.13066