Non-local Neural Networks
Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

TL;DR
This paper introduces non-local operations as a versatile building block for neural networks, enabling the capture of long-range dependencies in vision tasks, and demonstrates their effectiveness across video classification and image recognition benchmarks.
Contribution
It proposes a generic non-local operation that captures long-range dependencies and can be integrated into various architectures for improved performance.
Findings
Non-local models outperform or match state-of-the-art on Kinetics and Charades datasets.
Improved object detection, segmentation, and pose estimation on COCO tasks.
Non-local operations effectively model long-range dependencies in vision tasks.
Abstract
Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsAverage Pooling · ResNeXt Block · Concatenation Affinity · Embedded Dot Product Affinity · Embedded Gaussian Affinity · Non-Local Block · Non-Local Operation · Weight Decay · SGD with Momentum · Grouped Convolution
