Scalable Offline Model-Based RL with Action Chunks

Kwanyoung Park; Seohong Park; Youngwoon Lee; Sergey Levine

arXiv:2512.08108·cs.LG·December 10, 2025

Scalable Offline Model-Based RL with Action Chunks

Kwanyoung Park, Seohong Park, Youngwoon Lee, Sergey Levine

PDF

Open Access 3 Reviews

TL;DR

This paper introduces MAC, a scalable offline model-based RL method using action chunks to reduce model errors and rejection sampling to prevent exploitation, achieving state-of-the-art results on complex long-horizon tasks.

Contribution

The paper proposes action-chunk models and rejection sampling techniques to improve the scalability and performance of offline model-based RL in complex tasks.

Findings

01

MAC outperforms existing offline model-based RL algorithms on long-horizon tasks.

02

Action chunks reduce compounding errors in long-term predictions.

03

Rejection sampling prevents model exploitation from out-of-distribution actions.

Abstract

In this paper, we study whether model-based reinforcement learning (RL), in particular model-based value expansion, can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL. Model-based value expansion fits an on-policy value function using length-n imaginary rollouts generated by the current policy and a learned dynamics model. While larger n reduces bias in value bootstrapping, it amplifies accumulated model errors over long horizons, degrading future predictions. We address this trade-off with an \emph{action-chunk} model that predicts a future state from a sequence of actions (an "action chunk") instead of a single action, which reduces compounding errors. In addition, instead of directly training a policy to maximize rewards, we employ rejection sampling from an expressive behavioral action-chunk policy, which prevents model exploitation from…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

[S1] MAC's action-chunk model, combined with in-distribution rejection sampling, offers a promising solution to the problems of compounding model error (a key issue in MBRL) and OOD model exploitation (a key issue in offline RL). It has achieved excellent empirical performance in long-horizon and large-scale benchmark tests.

Weaknesses

[W1] The main problem lies in the scalability of sampling with increasing action chunk length $n$. As $n$ increases, the dimensionality of the action chunk space also increases, making it exponentially more difficult to find high-value, in-distribution action sequences by rejection sampling. This sampling challenge may weaken the "return maximization" aspect of RL and may cause the agent's final behavior to be closer to behavior cloning (BC) rather than RL.

Reviewer 02Rating 4Confidence 4

Strengths

The issue they address is relevant, the results seem strong, and the paper is reasonably clear in explaining a complicated method.

Weaknesses

**W1.** There are so many moving parts, including training 6 models. This is expensive, and probably also more difficult to implement and debug compared to many other methods. **W2.** The main benchmark in offline RL is probably D4RL. The authors have not provided results on this. **W3.** As far as I understand the proposed method is roughly N times more expensive than prior algorithms(?) Furthermore, N is high (~32), and they potentially need larger models since action-chunk inputs to the mod

Reviewer 03Rating 6Confidence 4

Strengths

1. Scale up RL methods to tackle long-horizon problems is very important. 2. The idea of adopting action chunks to address the dilemma between long horizons and small functional calls is quite straightforward. 3. The proposed method (MAC) is quite simple to be implemented. So, this paper offers a good starting point for future research on this direction. 4. The empirical performance on robotics manipulation tasks in OGBench are also strong, offering potential solutions for robotics.

Weaknesses

I have a few concerns regarding the novelty and the complexity of the proposed method: 1. `Novelty of Action Chunks` The idea of adopting action chunks to extend RL horizons is not particularly novel. In fact, a recent work [1] has already explored a model-free version of this idea. Therefore, the novelty contribution of this paper is somewhat limited. However, since action chunking appears to be a principled and simple approach to addressing the dilemma highlighted by the authors, I would not

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)