Improving Generalization of Language-Conditioned Robot Manipulation

Chenglin Cui; Chaoran Zhu; Changjae Oh; Andrea Cavallaro

arXiv:2508.02405·cs.RO·August 5, 2025

Improving Generalization of Language-Conditioned Robot Manipulation

Chenglin Cui, Chaoran Zhu, Changjae Oh, Andrea Cavallaro

PDF

Open Access

TL;DR

This paper introduces a two-stage framework for language-conditioned robot manipulation that learns from few demonstrations, improving generalization and enabling zero-shot transfer in real-world environments.

Contribution

The paper proposes an instance-level semantic fusion module and a two-stage task decomposition approach that enhances generalization in language-conditioned robot manipulation from limited data.

Findings

01

Improves generalization in unseen environments

02

Enables zero-shot manipulation in real robots

03

Performs well with few demonstrations

Abstract

The control of robots for manipulation tasks generally relies on visual input. Recent advances in vision-language models (VLMs) enable the use of natural language instructions to condition visual input and control robots in a wider range of environments. However, existing methods require a large amount of data to fine-tune VLMs for operating in unseen environments. In this paper, we present a framework that learns object-arrangement tasks from just a few demonstrations. We propose a two-stage framework that divides object-arrangement tasks into a target localization stage, for picking the object, and a region determination stage for placing the object. We present an instance-level semantic fusion module that aligns the instance-level image crops with the text embedding, enabling the model to identify the target objects defined by the natural language instructions. We validate our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Advanced Neural Network Applications