TL;DR
This paper offers a theoretical framework for understanding nonlocal blocks in neural networks as graph filters, introduces a spectral nonlocal block, and demonstrates its effectiveness across various vision tasks.
Contribution
It provides a unified graph filter perspective for nonlocal blocks and proposes a spectral variant that improves robustness and flexibility.
Findings
Spectral nonlocal block outperforms existing nonlocal blocks in experiments.
The proposed method enhances performance in image classification, segmentation, and recognition.
Theoretical analysis unifies and explains the properties of nonlocal-based modules.
Abstract
The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performance, they still lack the mechanism to encode the rich, structured information among elements in an image or video. In this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a new perspective to interpret them, where we view them as a set of graph filters generated on a fully-connected graph. Specifically, when choosing the Chebyshev graph filter, a unified formulation can be derived for explaining and analyzing the existing nonlocal-based blocks (e.g., nonlocal block, nonlocal stage, double attention block). Furthermore, by concerning the property of spectral, we propose an efficient and robust spectral nonlocal block, which can be more robust and flexible to catch long-range dependencies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
