# Weakly-Supervised Spatial Context Networks

**Authors:** Zuxuan Wu, Larry S. Davis, Leonid Sigal

arXiv: 1704.02998 · 2019-01-31

## TL;DR

This paper introduces spatial context networks that predict the representation of one image patch from another within the same image, leveraging spatial relationships as a self-supervisory signal to improve visual feature learning.

## Contribution

The authors propose a novel spatial context network that encodes and reconstructs intermediate representations based on spatial offsets, enhancing pre-trained models without additional supervision.

## Key findings

- Improved object categorization and detection performance on VOC2007.
- Object-centric patch selection yields the highest performance gains.
- Spatial context supervision enhances existing pre-trained models.

## Abstract

We explore the power of spatial context as a self-supervisory signal for learning visual representations. In particular, we propose spatial context networks that learn to predict a representation of one image patch from another image patch, within the same image, conditioned on their real-valued relative spatial offset. Unlike auto-encoders, that aim to encode and reconstruct original image patches, our network aims to encode and reconstruct intermediate representations of the spatially offset patches. As such, the network learns a spatially conditioned contextual representation. By testing performance with various patch selection mechanisms we show that focusing on object-centric patches is important, and that using object proposal as a patch selection mechanism leads to the highest improvement in performance. Further, unlike auto-encoders, context encoders [21], or other forms of unsupervised feature learning, we illustrate that contextual supervision (with pre-trained model initialization) can improve on existing pre-trained model performance. We build our spatial context networks on top of standard VGG_19 and CNN_M architectures and, among other things, show that we can achieve improvements (with no additional explicit supervision) over the original ImageNet pre-trained VGG_19 and CNN_M models in object categorization and detection on VOC2007.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.02998/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1704.02998/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1704.02998/full.md

---
Source: https://tomesphere.com/paper/1704.02998