CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection

Jiajin Tang; Ge Zheng; Jingyi Yu; Sibei Yang

arXiv:2309.01093·cs.CV·September 6, 2023

CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection

Jiajin Tang, Ge Zheng, Jingyi Yu, Sibei Yang

PDF

Open Access

TL;DR

This paper introduces CoTDet, a novel object detection framework that leverages affordance knowledge and multi-level reasoning from large language models to improve detection of task-relevant objects beyond traditional category-based methods.

Contribution

It proposes a new approach using affordance knowledge and multi-level chain-of-thought prompting to enhance object detection for task-driven scenarios.

Findings

01

Significant performance improvements over state-of-the-art (+15.6 box AP, +14.8 mask AP)

02

Ability to generate rationales explaining object affordances

03

Effective utilization of large language models for knowledge extraction

Abstract

Task driven object detection aims to detect object instances suitable for affording a task in an image. Its challenge lies in object categories available for the task being too diverse to be limited to a closed set of object vocabulary for traditional object detection. Simply mapping categories and visual features of common objects to the task cannot address the challenge. In this paper, we propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task. Moreover, we propose a novel multi-level chain-of-thought prompting (MLCoT) to extract the affordance knowledge from large language models, which contains multi-level reasoning steps from task to object examples to essential visual attributes with rationales. Furthermore, to fully exploit knowledge to benefit object recognition and localization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Robot Manipulation and Learning