Graph Reasoning Transformer for Image Parsing
Dong Zhang, Jinhui Tang, and Kwang-Ting Cheng

TL;DR
This paper introduces a Graph Reasoning Transformer (GReaT) that enhances image parsing by enabling relation-based interactions among image patches, improving efficiency and accuracy over traditional transformers.
Contribution
The paper proposes a novel graph reasoning framework within a transformer for image parsing, addressing redundancy and orientation issues in patch interactions.
Findings
GReaT outperforms baseline transformers on Cityscapes and ADE20K datasets.
GReaT achieves higher interaction efficiency with minimal computational overhead.
Experimental results demonstrate consistent performance improvements.
Abstract
Capturing the long-range dependencies has empirically proven to be effective on a wide range of computer vision tasks. The progressive advances on this topic have been made through the employment of the transformer framework with the help of the multi-head attention mechanism. However, the attention-based image patch interaction potentially suffers from problems of redundant interactions of intra-class patches and unoriented interactions of inter-class patches. In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern. Specifically, the linearly embedded image patches are first projected into the graph space, where each node represents the implicit visual center for a cluster of image patches and each edge reflects the relation weight between two adjacent nodes. After that, global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Residual Connection · Dense Connections
