ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

Shaofei Huang; Si Liu; Tianrui Hui; Jizhong Han; Bo Li; Jiashi Feng; and Shuicheng Yan

arXiv:2101.03929·cs.CV·January 12, 2021

ORDNet: Capturing Omni-Range Dependencies for Scene Parsing

Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, and Shuicheng Yan

PDF

TL;DR

ORDNet is a novel neural network architecture that captures short, middle, and long-range dependencies in scene images, significantly improving scene parsing accuracy across multiple benchmarks.

Contribution

The paper introduces a Middle-Range branch and a Reweighed Long-Range branch to fill the gap between local and global dependencies in scene parsing models.

Findings

01

Outperforms state-of-the-art on PASCAL Context

02

Achieves superior results on COCO Stuff

03

Sets new benchmarks on ADE20K

Abstract

Learning to capture dependencies between spatial positions is essential to many visual tasks, especially the dense labeling problems like scene parsing. Existing methods can effectively capture long-range dependencies with self-attention mechanism while short ones by local convolution. However, there is still much gap between long-range and short-range dependencies, which largely reduces the models' flexibility in application to diverse spatial scales and relationships in complicated natural scene images. To fill such a gap, we develop a Middle-Range (MR) branch to capture middle-range dependencies by restricting self-attention into local patches. Also, we observe that the spatial regions which have large correlations with others can be emphasized to exploit long-range dependencies more accurately, and thus propose a Reweighed Long-Range (RLR) branch. Based on the proposed MR and RLR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.