VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents

Pengsen Liu; Maosen Zeng; Nan Tang; Kaiyuan Li; Jing-Cheng Pang; Yunan Liu; Yang Yu

arXiv:2603.22892·cs.LG·March 25, 2026

VLGOR: Visual-Language Knowledge Guided Offline Reinforcement Learning for Generalizable Agents

Pengsen Liu, Maosen Zeng, Nan Tang, Kaiyuan Li, Jing-Cheng Pang, Yunan Liu, Yang Yu

PDF

Open Access

TL;DR

VLGOR introduces a framework that combines visual and language knowledge to generate imaginary environment interactions, enhancing offline reinforcement learning for better generalization to unseen tasks.

Contribution

The paper presents a novel method that integrates visual-language models with offline RL, enabling the generation of diverse, coherent rollouts for improved task generalization.

Findings

01

Achieves over 24% higher success rate on robotic manipulation benchmarks.

02

Effectively generates diverse and plausible environment rollouts.

03

Enhances agent performance on unseen tasks with novel policies.

Abstract

Combining Large Language Models (LLMs) with Reinforcement Learning (RL) enables agents to interpret language instructions more effectively for task execution. However, LLMs typically lack direct perception of the physical environment, which limits their understanding of environmental dynamics and their ability to generalize to unseen tasks. To address this limitation, we propose Visual-Language Knowledge-Guided Offline Reinforcement Learning (VLGOR), a framework that integrates visual and language knowledge to generate imaginary rollouts, thereby enriching the interaction data. The core premise of VLGOR is to fine-tune a vision-language model to predict future states and actions conditioned on an initial visual observation and high-level instructions, ensuring that the generated rollouts remain temporally coherent and spatially plausible. Furthermore, we employ counterfactual prompts to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning