GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning

Shikhhar Siingh; Abhinav Rawat; Chitta Baral; Vivek Gupta

arXiv:2505.21863·cs.CV·June 4, 2025

GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning

Shikhhar Siingh, Abhinav Rawat, Chitta Baral, Vivek Gupta

PDF

Open Access 1 Video

TL;DR

GETReason is a hierarchical multi-agent framework that improves image understanding by extracting global event, temporal, and geospatial context, validated by a new reasoning-based evaluation metric.

Contribution

It introduces GETReason, a novel multi-agent reasoning framework that enhances image context extraction beyond surface descriptions, incorporating geospatial and temporal information.

Findings

01

Effective inference of deeper contextual meaning from images.

02

Introduction of GREAT, a new metric for reasoning-based image evaluation.

03

Demonstrated improved understanding linking images to broader events.

Abstract

Publicly significant images from events hold valuable contextual information, crucial for journalism and education. However, existing methods often struggle to extract this relevance accurately. To address this, we introduce GETReason (Geospatial Event Temporal Reasoning), a framework that moves beyond surface-level image descriptions to infer deeper contextual meaning. We propose that extracting global event, temporal, and geospatial information enhances understanding of an image's significance. Additionally, we introduce GREAT (Geospatial Reasoning and Event Accuracy with Temporal Alignment), a new metric for evaluating reasoning-based image understanding. Our layered multi-agent approach, assessed using a reasoning-weighted metric, demonstrates that meaningful insights can be inferred, effectively linking images to their broader event context.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques