LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation

Motonari Kambara; Koki Seno; Tomoya Kaichi; Yanan Wang; and Komei Sugiura

arXiv:2603.25481·cs.RO·March 27, 2026

LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation

Motonari Kambara, Koki Seno, Tomoya Kaichi, Yanan Wang, and Komei Sugiura

PDF

Open Access

TL;DR

LILAC introduces a flow-based model that generates object-centric optical flow from images and language instructions, enabling open-loop robotic manipulation with minimal embodiment-specific data and outperforming existing methods.

Contribution

The paper presents LILAC, a novel flow-based vision-language model that aligns instructions with object trajectories for robotic manipulation, incorporating semantic and prompt-based alignment techniques.

Findings

01

Outperformed existing methods in flow quality benchmarks.

02

Achieved higher task success rates in physical manipulation experiments.

03

Effective language-conditioned trajectory generation from minimal data.

Abstract

We address language-conditioned robotic manipulation using flow-based trajectory generation, which enables training on human and web videos of object manipulation and requires only minimal embodiment-specific data. This task is challenging, as object trajectory generation from pre-manipulation images and natural language instructions requires appropriate instruction-flow alignment. To tackle this challenge, we propose the flow-based Language Instruction-guided open-Loop ACtion generator (LILAC). This flow-based Vision-Language-Action model (VLA) generates object-centric 2D optical flow from an RGB image and a natural language instruction, and converts the flow into a 6-DoF manipulator trajectory. LILAC incorporates two key components: Semantic Alignment Loss, which strengthens language conditioning to generate instruction-aligned optical flow, and Prompt-Conditioned Cross-Modal Adapter,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI