CRAFT-E: A Neuro-Symbolic Framework for Embodied Affordance Grounding
Zhou Chen, Joe Lin, Carson Bulgin, Sathyanarayanan N. Aakur

TL;DR
CRAFT-E is a neuro-symbolic framework that enables interpretable and reliable grounding of language-based action queries to objects in assistive robots by combining structured knowledge graphs, visual-language alignment, and grasp reasoning.
Contribution
It introduces a modular neuro-symbolic system for affordance grounding, integrating symbolic reasoning with embodied perception, and provides a new benchmark dataset for evaluation.
Findings
Achieves competitive performance in static scene and real-world tasks
Remains robust under perceptual noise
Provides transparent, component-level diagnostics
Abstract
Assistive robots operating in unstructured environments must understand not only what objects are, but what they can be used for. This requires grounding language-based action queries to objects that both afford the requested function and can be physically retrieved. Existing approaches often rely on black-box models or fixed affordance labels, limiting transparency, controllability, and reliability for human-facing applications. We introduce CRAFT-E, a modular neuro-symbolic framework that composes a structured verb-property-object knowledge graph with visual-language alignment and energy-based grasp reasoning. The system generates interpretable grounding paths that expose the factors influencing object selection and incorporates grasp feasibility as an integral part of affordance inference. We further construct a benchmark dataset with unified annotations for verb-object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI
