QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time   Reasoning

Weimin Shi; Mingchen Zhuge; Dehong Gao; Zhong Zhou; Ming-Ming Cheng,; Deng-Ping Fan

arXiv:2302.00952·cs.CV·June 29, 2023·1 cites

QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning

Weimin Shi, Mingchen Zhuge, Dehong Gao, Zhong Zhou, Ming-Ming Cheng,, Deng-Ping Fan

PDF

Open Access

TL;DR

QR-CLIP is a novel model that leverages open-world knowledge to improve location and time reasoning from images, outperforming previous methods significantly.

Contribution

The paper introduces QR-CLIP, a new model inspired by Horn's QR theory, integrating open-world knowledge for enhanced location and time inference from images.

Findings

01

Outperforms previous SOTA by about 10% in location reasoning

02

Achieves approximately 130% relative improvement in time reasoning

03

Establishes a technical foundation for open-world knowledge integration in reasoning tasks

Abstract

Daily images may convey abstract meanings that require us to memorize and infer profound information from them. To encourage such human-like reasoning, in this work, we teach machines to predict where and when it was taken rather than performing basic tasks like traditional segmentation or classification. Inspired by Horn's QR theory, we designed a novel QR-CLIP model consisting of two components: 1) the Quantity module first retrospects more open-world knowledge as the candidate language inputs; 2) the Relevance module carefully estimates vision and language cues and infers the location and time. Experiments show our QR-CLIP's effectiveness, and it outperforms the previous SOTA on each task by an average of about 10% and 130% relative lift in terms of location and time reasoning. This study lays a technical foundation for location and time reasoning and suggests that effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning