LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation
Motonari Kambara, Koki Seno, Tomoya Kaichi, Yanan Wang, and Komei Sugiura

TL;DR
LILAC introduces a flow-based model that generates object-centric optical flow from images and language instructions, enabling open-loop robotic manipulation with minimal embodiment-specific data and outperforming existing methods.
Contribution
The paper presents LILAC, a novel flow-based vision-language model that aligns instructions with object trajectories for robotic manipulation, incorporating semantic and prompt-based alignment techniques.
Findings
Outperformed existing methods in flow quality benchmarks.
Achieved higher task success rates in physical manipulation experiments.
Effective language-conditioned trajectory generation from minimal data.
Abstract
We address language-conditioned robotic manipulation using flow-based trajectory generation, which enables training on human and web videos of object manipulation and requires only minimal embodiment-specific data. This task is challenging, as object trajectory generation from pre-manipulation images and natural language instructions requires appropriate instruction-flow alignment. To tackle this challenge, we propose the flow-based Language Instruction-guided open-Loop ACtion generator (LILAC). This flow-based Vision-Language-Action model (VLA) generates object-centric 2D optical flow from an RGB image and a natural language instruction, and converts the flow into a 6-DoF manipulator trajectory. LILAC incorporates two key components: Semantic Alignment Loss, which strengthens language conditioning to generate instruction-aligned optical flow, and Prompt-Conditioned Cross-Modal Adapter,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
