Understanding Long Videos via LLM-Powered Entity Relation Graphs

Meng Chu; Yicong Li; Tat-Seng Chua

arXiv:2501.15953·cs.IR·January 28, 2025

Understanding Long Videos via LLM-Powered Entity Relation Graphs

Meng Chu, Yicong Li, Tat-Seng Chua

PDF

Open Access

TL;DR

This paper introduces GraphVideoAgent, a novel system that uses graph-based object tracking combined with large language models to improve understanding of long videos by capturing temporal relationships and interactions more effectively.

Contribution

The paper presents a dynamic graph framework integrated with LLMs for enhanced long video analysis, outperforming existing methods in accuracy and efficiency.

Findings

01

Achieved 2.2 improvement on EgoSchema dataset

02

Achieved 2.0 improvement on NExT-QA benchmark

03

Required analysis of only around 8 frames on average

Abstract

The analysis of extended video content poses unique challenges in artificial intelligence, particularly when dealing with the complexity of tracking and understanding visual elements across time. Current methodologies that process video frames sequentially struggle to maintain coherent tracking of objects, especially when these objects temporarily vanish and later reappear in the footage. A critical limitation of these approaches is their inability to effectively identify crucial moments in the video, largely due to their limited grasp of temporal relationships. To overcome these obstacles, we present GraphVideoAgent, a cutting-edge system that leverages the power of graph-based object tracking in conjunction with large language model capabilities. At its core, our framework employs a dynamic graph structure that maps and monitors the evolving relationships between visual entities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis