Graph Reasoning Transformer for Image Parsing

Dong Zhang; Jinhui Tang; and Kwang-Ting Cheng

arXiv:2209.09545·cs.CV·September 21, 2022

Graph Reasoning Transformer for Image Parsing

Dong Zhang, Jinhui Tang, and Kwang-Ting Cheng

PDF

Open Access

TL;DR

This paper introduces a Graph Reasoning Transformer (GReaT) that enhances image parsing by enabling relation-based interactions among image patches, improving efficiency and accuracy over traditional transformers.

Contribution

The paper proposes a novel graph reasoning framework within a transformer for image parsing, addressing redundancy and orientation issues in patch interactions.

Findings

01

GReaT outperforms baseline transformers on Cityscapes and ADE20K datasets.

02

GReaT achieves higher interaction efficiency with minimal computational overhead.

03

Experimental results demonstrate consistent performance improvements.

Abstract

Capturing the long-range dependencies has empirically proven to be effective on a wide range of computer vision tasks. The progressive advances on this topic have been made through the employment of the transformer framework with the help of the multi-head attention mechanism. However, the attention-based image patch interaction potentially suffers from problems of redundant interactions of intra-class patches and unoriented interactions of inter-class patches. In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern. Specifically, the linearly embedded image patches are first projected into the graph space, where each node represents the implicit visual center for a cluster of image patches and each edge reflects the relation weight between two adjacent nodes. After that, global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Residual Connection · Dense Connections