DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer   Questions in Dynamic Scenes

Sergey Linok; Vadim Semenov; Anastasia Trunova; Oleg Bulichev; Dmitry; Yudin

arXiv:2505.03581·cs.CV·May 7, 2025

DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes

Sergey Linok, Vadim Semenov, Anastasia Trunova, Oleg Bulichev, Dmitry, Yudin

PDF

Open Access 1 Repo

TL;DR

DyGEnc introduces a novel method for encoding sequences of textual scene graphs to improve reasoning and question answering in dynamic scenes, outperforming visual models and enabling applications in robotics.

Contribution

The paper presents DyGEnc, a new approach that combines compressed spatial-temporal graph encoding with large language models for enhanced scene understanding.

Findings

01

Outperforms existing visual methods by 15-25% on question answering tasks.

02

Can be extended to process raw images through foundational models.

03

Demonstrated effectiveness in robotic scene understanding and memory.

Abstract

The analysis of events in dynamic environments poses a fundamental challenge in the development of intelligent agents and robots capable of interacting with humans. Current approaches predominantly utilize visual models. However, these methods often capture information implicitly from images, lacking interpretable spatial-temporal object representations. To address this issue we introduce DyGEnc - a novel method for Encoding a Dynamic Graph. This method integrates compressed spatial-temporal structural observation representation with the cognitive capabilities of large language models. The purpose of this integration is to enable advanced question answering based on a sequence of textual scene graphs. Extended evaluations on the STAR and AGQA datasets indicate that DyGEnc outperforms existing visual methods by a large margin of 15-25% in addressing queries regarding the history of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linukc/dygenc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques