The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning
Jiyu Lim, Youngwoo Yoon, and Kwanghyun Park

TL;DR
This paper introduces CRISP, an autonomous framework enabling robots to critique and refine their social behaviors using a Vision-Language Model, significantly improving naturalness and appropriateness across various robot platforms.
Contribution
CRISP is a novel, general framework that allows robots to self-critique and replan social behaviors autonomously using VLMs, reducing human intervention and enhancing cross-platform applicability.
Findings
Higher preference ratings in user studies
Improved situational appropriateness of behaviors
Effective self-refinement across multiple robot types
Abstract
Conventional robot social behavior generation has been limited in flexibility and autonomy, relying on predefined motions or human feedback. This study proposes CRISP (Critique-and-Replan for Interactive Social Presence), an autonomous framework where a robot critiques and replans its own actions by leveraging a Vision-Language Model (VLM) as a `human-like social critic.' CRISP integrates (1) extraction of movable joints and constraints by analyzing the robot's description file (e.g., MJCF), (2) generation of step-by-step behavior plans based on situational context, (3) generation of low-level joint control code by referencing visual information (joint range-of-motion visualizations), (4) VLM-based evaluation of social appropriateness and naturalness, including pinpointing erroneous steps, and (5) iterative refinement of behaviors through reward-based search. This approach is not tied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Robot Manipulation and Learning · Multimodal Machine Learning Applications
