PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Hammad Ayyubi, Xuande Feng, Junzhang Liu, Xudong Lin, Zhecan Wang,, Shih-Fu Chang

TL;DR
PuzzleGPT emulates human puzzle-solving skills to predict time and location from images, achieving state-of-the-art zero-shot performance by integrating visual clues, reasoning, external knowledge, and robustness modules.
Contribution
The paper introduces PuzzleGPT, a modular expert pipeline that formalizes and emulates human-like puzzle-solving abilities for complex image-based predictions.
Findings
Achieves state-of-the-art zero-shot performance on TARA and WikiTilo datasets.
Outperforms large vision-language models and GPT-4V in the task.
Rivals or surpasses finetuned models in accuracy.
Abstract
The task of predicting time and location from images is challenging and requires complex human-like puzzle-solving ability over different clues. In this work, we formalize this ability into core skills and implement them using different modules in an expert pipeline called PuzzleGPT. PuzzleGPT consists of a perceiver to identify visual clues, a reasoner to deduce prediction candidates, a combiner to combinatorially combine information from different clues, a web retriever to get external knowledge if the task can't be solved locally, and a noise filter for robustness. This results in a zero-shot, interpretable, and robust approach that records state-of-the-art performance on two datasets -- TARA and WikiTilo. PuzzleGPT outperforms large VLMs such as BLIP-2, InstructBLIP, LLaVA, and even GPT-4V, as well as automatically generated reasoning pipelines like VisProg, by at least 32% and 38%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Geographic Information Systems Studies · Human Mobility and Location-Based Analysis
