Multi-Graph Transformer for Free-Hand Sketch Recognition
Peng Xu, Chaitanya K. Joshi, Xavier Bresson

TL;DR
This paper introduces a novel Multi-Graph Transformer model that represents sketches as multiple graphs, capturing geometric and temporal features, leading to improved recognition accuracy over existing CNN and RNN methods.
Contribution
It is the first to represent sketches as graphs and apply GNNs for recognition, combining global, local, and temporal information in a unified framework.
Findings
Achieves 72.80% accuracy on Google QuickDraw sketches.
Outperforms all RNN-based models significantly.
Close to CNN-based performance upper bound.
Abstract
Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with Convolutional Neural Networks (CNNs) or the temporal sequential property with Recurrent Neural Networks (RNNs). In this work, we propose a new representation of sketches as multiple sparsely connected graphs. We design a novel Graph Neural Network (GNN), the Multi-Graph Transformer (MGT), for learning representations of sketches from multiple graphs which simultaneously capture global and local geometric stroke structures, as well as temporal information. We report extensive numerical experiments on a sketch recognition task to demonstrate the performance of the proposed approach. Particularly, MGT applied on 414k sketches from Google QuickDraw:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsGraph Neural Network · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam
