Investigating transformers in the decomposition of polygonal shapes as point collections
Andrea Alfieri, Yancong Lin, Jan C. van Gemert

TL;DR
This paper explores how transformer architectures perform in decomposing polygonal shapes into point collections, emphasizing the importance of natural point orderings and auto-regressive prediction for complex shape representation.
Contribution
It demonstrates the benefits of auto-regressive decomposition of polygons into points for improved transformer-based shape analysis.
Findings
Auto-regressive prediction improves shape decomposition accuracy.
Natural point ordering enhances transformer performance.
Decomposing polygons into point collections aids visual shape understanding.
Abstract
Transformers can generate predictions in two approaches: 1. auto-regressively by conditioning each sequence element on the previous ones, or 2. directly produce an output sequences in parallel. While research has mostly explored upon this difference on sequential tasks in NLP, we study the difference between auto-regressive and parallel prediction on visual set prediction tasks, and in particular on polygonal shapes in images because polygons are representative of numerous types of objects, such as buildings or obstacles for aerial vehicles. This is challenging for deep learning architectures as a polygon can consist of a varying carnality of points. We provide evidence on the importance of natural orders for Transformers, and show the benefit of decomposing complex polygons into collections of points in an auto-regressive manner.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
