REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation

Ma\"elic Neau; Paulo E. Santos; Anne-Gwenn Bosser; C\'edric Buche; Akihiro Sugimoto

arXiv:2405.16116·cs.CV·September 24, 2025

REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation

Ma\"elic Neau, Paulo E. Santos, Anne-Gwenn Bosser, C\'edric Buche, Akihiro Sugimoto

PDF

Open Access 1 Repo 1 Datasets

TL;DR

REACT is a novel scene graph generation architecture that balances real-time inference speed, object detection accuracy, and relation prediction, achieving the fastest speeds and significant improvements over existing methods.

Contribution

REACT introduces a new architecture that significantly improves inference speed and object detection accuracy while reducing model size for scene graph generation.

Findings

01

REACT is 2.7 times faster than existing models.

02

REACT improves object detection accuracy by 58%.

03

REACT reduces model size by an average of 5.5x.

Abstract

Scene Graph Generation (SGG) is a task that encodes visual relationships between objects in images as graph structures. SGG shows significant promise as a foundational component for downstream tasks, such as reasoning for embodied agents. To enable real-time applications, SGG must address the trade-off between performance and inference speed. However, current methods tend to focus on one of the following: (1) improving relation prediction accuracy, (2) enhancing object detection accuracy, or (3) reducing latency, without aiming to balance all three objectives simultaneously. To address this limitation, we propose the Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation (REACT) architecture, which achieves the highest inference speed among existing SGG models, improving object detection accuracy without sacrificing relation prediction performance. Compared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maelic/sgg-benchmark
pytorchOfficial

Datasets

maelic/PSG-coco-format
dataset· 336 dl
336 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Games · Human Motion and Animation

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus · You Only Look Once