CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
Trong-Thuan Nguyen, Pha Nguyen, Xin Li, Jackson Cothren, Alper Yilmaz,, Khoa Luu

TL;DR
This paper introduces CYCLO, a cyclic graph transformer for modeling multi-object relationships in aerial videos, demonstrating superior performance on drone scene understanding and in-the-wild benchmarks.
Contribution
The paper presents a novel cyclic graph transformer architecture and a new AeroEye dataset for better multi-object relationship modeling in aerial videos.
Findings
CYCLO outperforms existing models on AeroEye dataset.
Achieves state-of-the-art results on PVSG and ASPIRe benchmarks.
Effectively captures cyclical and long-range temporal dependencies.
Abstract
Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually comprehensive and precise collection of predicates that capture the intricate relationships and spatial arrangements among objects. To this end, we propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies by continuously updating the history of interactions in a circular manner. The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Graph Theory and Algorithms
MethodsAttention Is All You Need · Laplacian EigenMap · Laplacian Positional Encodings · Softmax · Layer Normalization · Graph Transformer · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam
