GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
Fenghua Cheng, Jinxiang Wang, Sen Wang, Zi Huang, Xue Li

TL;DR
This paper introduces GeoGuess, a challenging multimodal reasoning task involving identifying street view locations and explaining visual clues, supported by a new dataset and a hierarchical reasoning method called SightSense.
Contribution
It proposes a novel task, GeoGuess, emphasizing hierarchical visual reasoning, and provides a new dataset GeoExplain along with the SightSense method for improved multimodal reasoning.
Findings
SightSense achieves outstanding performance on GeoGuess.
The GeoExplain dataset effectively supports hierarchical visual reasoning.
GeoGuess advances understanding of multimodal reasoning with hierarchical visual clues.
Abstract
Multimodal reasoning is a process of understanding, integrating and inferring information across different data modalities. It has recently attracted surging academic attention as a benchmark for Artificial Intelligence (AI). Although there are various tasks for evaluating multimodal reasoning ability, they still have limitations. Lack of reasoning on hierarchical visual clues at different levels of granularity, e.g., local details and global context, is of little discussion, despite its frequent involvement in real scenarios. To bridge the gap, we introduce a novel and challenging task for multimodal reasoning, namely GeoGuess. Given a street view image, the task is to identify its location and provide a detailed explanation. A system that succeeds in GeoGuess should be able to detect tiny visual clues, perceive the broader landscape, and associate with vast geographic knowledge.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Data Visualization and Analytics
