Synthetic Data is Sufficient for Zero-Shot Visual Generalization from Offline Data

Ahmet H. G\"uzel; Ilija Bogunovic; Jack Parker-Holder

arXiv:2508.12356·cs.CV·August 19, 2025

Synthetic Data is Sufficient for Zero-Shot Visual Generalization from Offline Data

Ahmet H. G\"uzel, Ilija Bogunovic, Jack Parker-Holder

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that generating synthetic data via diffusion models enhances zero-shot visual generalization in offline reinforcement learning, reducing overfitting and improving performance without altering existing algorithms.

Contribution

It introduces a simple two-step method to augment offline data with synthetic samples using diffusion models, improving generalization in visual RL tasks.

Findings

01

Synthetic data improves zero-shot generalization.

02

Method reduces the generalization gap.

03

No changes needed in existing RL algorithms.

Abstract

Offline reinforcement learning (RL) offers a promising framework for training agents using pre-collected datasets without the need for further environment interaction. However, policies trained on offline data often struggle to generalise due to limited exposure to diverse states. The complexity of visual data introduces additional challenges such as noise, distractions, and spurious correlations, which can misguide the policy and increase the risk of overfitting if the training data is not sufficiently diverse. Indeed, this makes it challenging to leverage vision-based offline data in training robust agents that can generalize to unseen environments. To solve this problem, we propose a simple approach generating additional synthetic training data. We propose a two-step process, first augmenting the originally collected offline data to improve zero-shot generalization by introducing…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

- The method is simple and easy to understand; - The proposed method improves generalization performance of offline visual RL methods; - The method also yields an additional improvement when given a small subset of data with distractions; - The method only changes the dataset and therefore does not depend on the particular RL algorithm, so in theory it can be used to improve any offline RL algorithm;

Weaknesses

- As authors listed, the method requires tuning of the data augmentation parameters, which limits it's applicability. - The experiments only include DrQ and CQL. Since this paper deals with extending the data and can be applied to many different methods, it would make the method more compelling if there were more methods like in [SynthER](https://arxiv.org/abs/2303.06614), e.g. IQL, TD3+BC, EDAC. ###### Writing - Figures 4, 5, 6 are too large - JS divergence heatmaps in the figures throughout

Reviewer 02Rating 5Confidence 3

Strengths

1. The two-step approach effectively combines data augmentation and diffusion model-based upsampling, significantly reducing the generalization gap in both continuous (V-D4RL) and discrete (Procgen) control tasks. This leads to improved performance in unseen environments without requiring modifications to existing model-free offline RL algorithms. 2. By augmenting the original dataset and generating synthetic data in the latent space, the method broadens the distribution of training data. This

Weaknesses

Although the method shows promising results in benchmarks like V-D4RL and Procgen, these are controlled environments. It’s unclear how the method would perform in more complex, real-world scenarios where the variety of unseen situations is vastly greater than in benchmark tests. The effectiveness of the approach depends heavily on specific augmentation techniques like rotation, color jittering, and color cutout. The results may vary significantly if the distribution of unseen environments does

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper presents an innovative approach by combining two complementary data augmentation strategies: classic transformations and generative model-based data synthesis. This integration effectively leverages both the reliability of traditional augmentation methods and the diversity potential of generative modeling, providing a more comprehensive solution to the data diversity challenge in offline RL. 2. The implementation of diffusion model-based data synthesis in latent space, rather than

Weaknesses

**Weaknesses** 1. The discussion and analysis of chosen data augmentation techniques in Section 3.2 lacks sufficient depth. The authors should provide empirical evidence for their augmentation choices and properly reference established techniques from online RL literature, such as DrAC[1], SVEA[2], and the comprehensive survey[3]. The current treatment of augmentation strategies is superficial and fails to leverage valuable insights from prior work. 2. The proposed Generalization Performance m

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · AI in cancer detection · Advanced Vision and Imaging