Real-Time Robot Execution with Masked Action Chunking
Haoxuan Wang, Gengyu Zhang, Yan Yan, Yuzhang Shang, Ramana Rao Kompella, Gaowen Liu

TL;DR
This paper introduces REMAC, a method for improving real-time robot execution by learning corrective adjustments to handle intra-chunk inconsistencies during asynchronous inference, resulting in more reliable and faster task performance.
Contribution
We propose REMAC, a novel approach that learns corrective adjustments for action chunks to enhance robustness in asynchronous robot inference, addressing intra-chunk inconsistency.
Findings
REMAC improves task completion rates in real-world robot experiments.
The method maintains robustness under varying delays.
It enables faster and more reliable robot execution.
Abstract
Real-time execution is essential for cyber-physical systems such as robots. These systems operate in dynamic real-world environments where even small delays can undermine responsiveness and compromise performance. Asynchronous inference has recently emerged as a system-level paradigm for real-time robot manipulation, enabling the next action chunk to be predicted while the current one is being executed. While this approach achieves real-time responsiveness, naive integration often results in execution failure. Previous methods attributed this failure to inter-chunk discontinuity and developed test-time algorithms to smooth chunk boundaries. In contrast, we identify another critical yet overlooked factor: intra-chunk inconsistency, where the robot's executed action chunk partially misaligns with its current perception. To address this, we propose REMAC, which learns corrective…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper makes solid contributions to an important practical problem. Asynchronous inference is a natural system-level solution for real-time execution. Identifying intra-chunk inconsistency as a distinct failure mode adds value. - The paper does a thorough experimental analysis, comparing with the state-of-the-art method RTC, and doing ablation studies on each component of the proposed method. - Figure 1(c) provides a great concrete example of the problem the paper is aiming to solve.
- The paper cites intra-chunk inconsistency as a core motivation for the proposed method, but doesn't provide any direct evidence that this is a major issue. - Is there a way to apply this idea to policy classes other than flow-matching? - I don't understand the residual alignment term. Don't the two $\tilde{u}$ terms in eq (4) cancel out, making it equivalent to (2)? Is there a typo somewhere, or am I missing something? - The method requires specifying d_max as a hyperparameter during training.
- Identifies and formalizes intra-chunk inconsistency as a distinct source of degradation in asynchronous execution. - Simple and computationally efficient method that is compatible with existing VLA architectures. - Strong empirical evidence across simulation and real-robot settings, including latency sweeps and ablation studies. - No additional inference latency, in contrast to recent test-time smoothing approaches (e.g., RTC). - Parameter-efficient finetuning strategy that preserves the backb
- The method assumes access to accurate and bounded delay estimates. It is unclear how robust the approach is when latency measurements are noisy, rapidly fluctuating, or adversarially spiky. - Training samples delays uniformly, but real-world latency tends to be bursty and temporally correlated. Additional evaluation under realistic network- and compute-induced delay profiles would strengthen the claims. - Finetuning with masked actions may shift the behavior of the underlying VLA model. The pa
1. REMAC can adapt to various inference delays d with a single training process, without needing to retrain for each delay. The use of LoRA modules maintains model performance while reducing training overhead. 2. Flow matching enables the model to learn fine-grained continuity between action prefixes and optimal future actions, capturing the dependency between earlier and later actions. 3. REMAC achieves strong results across both simulated and real-world benchmarks.
1. REMAC has structural requirements on the dataset, which needs to contain diverse future action sequences from the same observation. With this requirement, the model can learn to correct deviating action prefixes resulting from inference delay. Without local adjustment samples in the dataset, REMAC may perform no better than behavioral cloning. 2. During inference, the agent must know how long inference takes in order to select an appropriate prefix length d from the previous action chunk. I
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReal-Time Systems Scheduling · Robot Manipulation and Learning · Reinforcement Learning in Robotics
