GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
Shikhhar Siingh, Abhinav Rawat, Chitta Baral, Vivek Gupta

TL;DR
GETReason is a hierarchical multi-agent framework that improves image understanding by extracting global event, temporal, and geospatial context, validated by a new reasoning-based evaluation metric.
Contribution
It introduces GETReason, a novel multi-agent reasoning framework that enhances image context extraction beyond surface descriptions, incorporating geospatial and temporal information.
Findings
Effective inference of deeper contextual meaning from images.
Introduction of GREAT, a new metric for reasoning-based image evaluation.
Demonstrated improved understanding linking images to broader events.
Abstract
Publicly significant images from events hold valuable contextual information, crucial for journalism and education. However, existing methods often struggle to extract this relevance accurately. To address this, we introduce GETReason (Geospatial Event Temporal Reasoning), a framework that moves beyond surface-level image descriptions to infer deeper contextual meaning. We propose that extracting global event, temporal, and geospatial information enhances understanding of an image's significance. Additionally, we introduce GREAT (Geospatial Reasoning and Event Accuracy with Temporal Alignment), a new metric for evaluating reasoning-based image understanding. Our layered multi-agent approach, assessed using a reasoning-weighted metric, demonstrates that meaningful insights can be inferred, effectively linking images to their broader event context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
