Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
Liangsheng Lu, Wei Zhai, Hongchen Luo, Yu Kang, Yang Cao

TL;DR
This paper introduces CBCE-Net, a novel vision-language model that detects object affordances based on action phrases, improving accuracy through cyclic bilateral interaction and mutual feature alignment.
Contribution
It proposes a cyclic bilateral interaction network for phrase-based affordance detection and extends a dataset with phrase annotations, advancing vision-language affordance understanding.
Findings
Outperforms nine baseline methods in accuracy and visual quality
Effectively aligns language and vision features through cyclic interaction
Enhances affordance detection with a new dataset extension
Abstract
Affordance detection, which refers to perceiving objects with potential action possibilities in images, is a challenging task since the possible affordance depends on the person's purpose in real-world application scenarios. The existing works mainly extract the inherent human-object dependencies from image/video to accommodate affordance properties that change dynamically. In this paper, we explore to perceive affordance from a vision-language perspective and consider the challenging phrase-based affordance detection problem,i.e., given a set of phrases describing the action purposes, all the object regions in a scene with the same affordance should be detected. To this end, we propose a cyclic bilateral consistency enhancement network (CBCE-Net) to align language and vision features progressively. Specifically, the presented CBCE-Net consists of a mutual guided vision-language module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Robot Manipulation and Learning
