CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation
Berk \c{C}i\c{c}ek, Mert K. Er, Ozgur S. Oguz

TL;DR
CoRAL is a hierarchical framework that combines LLMs, vision-language models, and online system identification to enable adaptive, contact-rich robotic manipulation with high success rates.
Contribution
It introduces a modular, zero-shot planning approach that decouples high-level reasoning from low-level control, incorporating real-time adaptation and memory for contact-rich tasks.
Findings
CoRAL achieves over 50% higher success rates than state-of-the-art baselines.
The neuro-symbolic adaptation loop improves environmental parameter estimation.
Real-world experiments validate CoRAL's effectiveness in challenging tasks.
Abstract
While Large Language Models (LLMs) and Vision-Language Models (VLMs) demonstrate remarkable capabilities in high-level reasoning and semantic understanding, applying them directly to contact-rich manipulation remains a challenge due to their lack of explicit physical grounding and inability to perform adaptive control. To bridge this gap, we propose CoRAL (Contact-Rich Adaptive LLM-based control), a modular framework that enables zero-shot planning by decoupling high-level reasoning from low-level control. Unlike black-box policies, CoRAL uses LLMs not as direct controllers, but as cost designers that synthesize context-aware objective functions for a sampling-based motion planner (MPPI). To address the ambiguity of physical parameters in visual data, we introduce a neuro-symbolic adaptation loop: a VLM provides semantic priors for environmental dynamics, such as mass and friction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
