Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning

Enze Pan

arXiv:2601.04695·cs.AI·April 21, 2026

Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning

Enze Pan

PDF

TL;DR

Tape is a benchmark designed to evaluate reinforcement learning algorithms' ability to generalize across latent rule-shifts in dynamics, providing a controlled environment to diagnose robustness and brittleness.

Contribution

This paper introduces Tape, a novel controlled benchmark isolating latent rule-shift in dynamics for evaluating RL generalization and robustness.

Findings

01

RL algorithms show a consistent drop in performance from ID to OOD settings.

02

Fragility to latent-law changes exists even in simple deterministic 1D environments.

03

Tape enables detailed diagnostics of policy robustness and adaptation to rule shifts.

Abstract

Out-of-distribution generalization in reinforcement learning is hard to diagnose when benchmark shifts mix dynamics, observations, goals, and rewards. We address this with Tape, a controlled benchmark that isolates latent rule-shift in dynamics while keeping the observation-action interface fixed. The protocol combines deterministic splits, 20-seed replication, bootstrap uncertainty reporting, and continuous metrics for sparse-success regimes. Across baseline families, we find a consistent ID-to-OOD drop and strong heterogeneity across stable/periodic/chaotic rules. Importantly, this fragility appears even in an intentionally simple 1D deterministic setting, suggesting that many current RL algorithms remain brittle to latent-law changes under minimal confounds. To calibrate strict success, we report a protocol-matched true-dynamics random-shooting reference (p_oracle is almost 0.187)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.