ORDNet: Capturing Omni-Range Dependencies for Scene Parsing
Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, and Shuicheng Yan

TL;DR
ORDNet is a novel neural network architecture that captures short, middle, and long-range dependencies in scene images, significantly improving scene parsing accuracy across multiple benchmarks.
Contribution
The paper introduces a Middle-Range branch and a Reweighed Long-Range branch to fill the gap between local and global dependencies in scene parsing models.
Findings
Outperforms state-of-the-art on PASCAL Context
Achieves superior results on COCO Stuff
Sets new benchmarks on ADE20K
Abstract
Learning to capture dependencies between spatial positions is essential to many visual tasks, especially the dense labeling problems like scene parsing. Existing methods can effectively capture long-range dependencies with self-attention mechanism while short ones by local convolution. However, there is still much gap between long-range and short-range dependencies, which largely reduces the models' flexibility in application to diverse spatial scales and relationships in complicated natural scene images. To fill such a gap, we develop a Middle-Range (MR) branch to capture middle-range dependencies by restricting self-attention into local patches. Also, we observe that the spatial regions which have large correlations with others can be emphasized to exploit long-range dependencies more accurately, and thus propose a Reweighed Long-Range (RLR) branch. Based on the proposed MR and RLR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
