Phrase-Based Affordance Detection via Cyclic Bilateral Interaction

Liangsheng Lu; Wei Zhai; Hongchen Luo; Yu Kang; Yang Cao

arXiv:2202.12076·cs.CV·February 28, 2022

Phrase-Based Affordance Detection via Cyclic Bilateral Interaction

Liangsheng Lu, Wei Zhai, Hongchen Luo, Yu Kang, Yang Cao

PDF

Open Access 4 Repos

TL;DR

This paper introduces CBCE-Net, a novel vision-language model that detects object affordances based on action phrases, improving accuracy through cyclic bilateral interaction and mutual feature alignment.

Contribution

It proposes a cyclic bilateral interaction network for phrase-based affordance detection and extends a dataset with phrase annotations, advancing vision-language affordance understanding.

Findings

01

Outperforms nine baseline methods in accuracy and visual quality

02

Effectively aligns language and vision features through cyclic interaction

03

Enhances affordance detection with a new dataset extension

Abstract

Affordance detection, which refers to perceiving objects with potential action possibilities in images, is a challenging task since the possible affordance depends on the person's purpose in real-world application scenarios. The existing works mainly extract the inherent human-object dependencies from image/video to accommodate affordance properties that change dynamically. In this paper, we explore to perceive affordance from a vision-language perspective and consider the challenging phrase-based affordance detection problem,i.e., given a set of phrases describing the action purposes, all the object regions in a scene with the same affordance should be detected. To this end, we propose a cyclic bilateral consistency enhancement network (CBCE-Net) to align language and vision features progressively. Specifically, the presented CBCE-Net consists of a mutual guided vision-language module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Robot Manipulation and Learning