DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
Sergey Linok, Vadim Semenov, Anastasia Trunova, Oleg Bulichev, Dmitry, Yudin

TL;DR
DyGEnc introduces a novel method for encoding sequences of textual scene graphs to improve reasoning and question answering in dynamic scenes, outperforming visual models and enabling applications in robotics.
Contribution
The paper presents DyGEnc, a new approach that combines compressed spatial-temporal graph encoding with large language models for enhanced scene understanding.
Findings
Outperforms existing visual methods by 15-25% on question answering tasks.
Can be extended to process raw images through foundational models.
Demonstrated effectiveness in robotic scene understanding and memory.
Abstract
The analysis of events in dynamic environments poses a fundamental challenge in the development of intelligent agents and robots capable of interacting with humans. Current approaches predominantly utilize visual models. However, these methods often capture information implicitly from images, lacking interpretable spatial-temporal object representations. To address this issue we introduce DyGEnc - a novel method for Encoding a Dynamic Graph. This method integrates compressed spatial-temporal structural observation representation with the cognitive capabilities of large language models. The purpose of this integration is to enable advanced question answering based on a sequence of textual scene graphs. Extended evaluations on the STAR and AGQA datasets indicate that DyGEnc outperforms existing visual methods by a large margin of 15-25% in addressing queries regarding the history of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
