YOLOA: Real-Time Affordance Detection via LLM Adapter

Yuqi Ji; Junjie Ke; Lihuo He; Jun Liu; Kaifan Zhang; Yu-Kun Lai; Guiguang Ding; Xinbo Gao

arXiv:2512.03418·cs.CV·December 4, 2025

YOLOA: Real-Time Affordance Detection via LLM Adapter

Yuqi Ji, Junjie Ke, Lihuo He, Jun Liu, Kaifan Zhang, Yu-Kun Lai, Guiguang Ding, Xinbo Gao

PDF

Open Access

TL;DR

YOLOA is a real-time affordance detection model that jointly predicts object classes, locations, and affordances using a lightweight detector enhanced by an LLM adapter, achieving state-of-the-art accuracy and efficiency.

Contribution

It introduces YOLOA, a novel real-time affordance detection framework that integrates LLM adapters to improve joint understanding of 'what', 'where', and 'how' in embodied AI.

Findings

01

Achieves 52.8 / 73.1 mAP on ADG-Det / IIT-Heat benchmarks.

02

Runs at up to 89.77 FPS, with a lightweight variant reaching 846.24 FPS.

03

Outperforms previous methods in accuracy and real-time performance.

Abstract

Affordance detection aims to jointly address the fundamental "what-where-how" challenge in embodied AI by understanding "what" an object is, "where" the object is located, and "how" it can be used. However, most affordance learning methods focus solely on "how" objects can be used while neglecting the "what" and "where" aspects. Other affordance detection methods treat object detection and affordance learning as two independent tasks, lacking effective interaction and real-time capability. To overcome these limitations, we introduce YOLO Affordance (YOLOA), a real-time affordance detection model that jointly handles these two tasks via a large language model (LLM) adapter. Specifically, YOLOA employs a lightweight detector consisting of object detection and affordance learning branches refined through the LLM Adapter. During training, the LLM Adapter interacts with object and affordance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications