A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective
Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei, Yang, Xiaoguang Han, Yizhou Yu

TL;DR
This survey comprehensively reviews the application of Graph Neural Networks and Graph Transformers in computer vision, categorizing their use across various data modalities and vision tasks to analyze their effectiveness and future prospects.
Contribution
It provides a task-oriented taxonomy of GNNs and graph Transformers in computer vision, covering applications across multiple data modalities and discussing their challenges and future directions.
Findings
Categorizes applications by data modality and vision task
Analyzes the performance of GNNs and graph Transformers in different tasks
Discusses limitations and future research directions
Abstract
Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Dropout · Dense Connections · Residual Connection · Absolute Position Encodings · Position-Wise Feed-Forward Layer
