State Combinatorial Generalization In Decision Making With Conditional Diffusion Models

Xintong Duan; Yutong He; Fahim Tajwar; Wen-Tse Chen; Ruslan Salakhutdinov; Jeff Schneider

arXiv:2501.13241·cs.LG·December 16, 2025

State Combinatorial Generalization In Decision Making With Conditional Diffusion Models

Xintong Duan, Yutong He, Fahim Tajwar, Wen-Tse Chen, Ruslan Salakhutdinov, Jeff Schneider

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new approach using conditioned diffusion models for zero-shot generalization in combinatorial decision-making problems, outperforming traditional RL methods in unseen state combinations.

Contribution

The work formalizes the zero-shot generalization problem in combinatorial decision-making and demonstrates the effectiveness of conditioned diffusion models over traditional RL algorithms.

Findings

01

Diffusion models generalize better to unseen state combinations.

02

Behavior cloning with diffusion models outperforms RL in various environments.

03

The approach is broadly applicable across maze, driving, and multiagent tasks.

Abstract

Many real-world decision-making problems are combinatorial in nature, where states (e.g., surrounding traffic of a self-driving car) can be seen as a combination of basic elements (e.g., pedestrians, trees, and other cars). Due to combinatorial complexity, observing all combinations of basic elements in the training set is infeasible, which leads to an essential yet understudied problem of zero-shot generalization to states that are unseen combinations of previously seen elements. In this work, we first formalize this problem and then demonstrate how existing value-based reinforcement learning (RL) algorithms struggle due to unreliable value predictions in unseen states. We argue that this problem cannot be addressed with exploration alone, but requires more expressive and generalizable models. We demonstrate that behavior cloning with a conditioned diffusion model trained on successful…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

- The problem of out-of-combination generalization is motivated clearly, and it's relevance to RL is demonstrated across a few problems. - Relative to the baseline considered, the proposed conditional diffusion model is clearly an improvement across an autonomous driving simulator, starcraft multi-agent challenge and various maze problems.

Weaknesses

- The proposed method of using a conditional diffusion model for planning is not detailed very clearly. - The experiments lack any strong baseline to accurately judge the proposed method. The authors cite a few prior works on combinatorial generalization in RL, but the experiments do not include any representative baselines. It is not clear whether the benefit of the proposed approach is due to the larger model used by the diffusion model, or the specific combination of offline RL with expert da

Reviewer 02Rating 3Confidence 4

Strengths

- Combinatorial complexity is a problem that seems often overlooked in machine learning topics. I am happy to see the authors address the subject. - I think the understanding of why diffusion models seem to generalize better is of great importance to the progress of many fields.

Weaknesses

While the paper addresses an important and timely topic, I have some questions and concerns about the technical content of the paper. - The authors argue that a state can be composed of base elements, and attributes to those base elements. While they do not explicitly specify what constitutes those base elements beyond "car" and "bike", they do mention that there are also attributes, which are not relevant to the state as it pertains to the decission making, but only to the rendering function.

Reviewer 03Rating 6Confidence 3

Strengths

1. This paper proposes a novel perspective in introducing the diffusion model as a planner for RL, leveraging its combinatorial genearalization capabilities. It proposes the concept of combinatorial states to illustrate the effectiveness of the diffusion model planner in this specific scenario. 2. The authors conduct experiments in several environments and test the effects of different conditioning on the diffusion model. And the formulaiton of the experiment is quite easy to follow.

Weaknesses

1. Presentation. (1) The paper’s organization is overly segmented, creating a somewhat confusing structure. For instance, Section 4 contains only one subsection (4.1), which could be merged with Section 5, as Section 4 is fairly brief. Additionally, the introduction to diffusion models would be more appropriately placed in the “Preliminaries” rather than within the methodology description. The main methodology appears to be in Section 6, yet it occupies only a single paragraph, offering minimal

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Complex Systems and Decision Making

MethodsDiffusion · Sparse Evolutionary Training