Bisimulation metric for Model Predictive Control
Yutaka Shimizu, Masayoshi Tomizuka

TL;DR
The paper introduces BS-MPC, a novel model predictive control method that uses a bisimulation metric to enhance training stability, robustness, and efficiency in reinforcement learning tasks.
Contribution
It proposes a new bisimulation metric loss integrated into MPC to improve encoder learning, stability, and robustness in complex environments.
Findings
BS-MPC outperforms baseline methods in continuous control tasks.
Enhanced robustness to input noise demonstrated.
Reduced training time improves computational efficiency.
Abstract
Model-based reinforcement learning has shown promise for improving sample efficiency and decision-making in complex environments. However, existing methods face challenges in training stability, robustness to noise, and computational efficiency. In this paper, we propose Bisimulation Metric for Model Predictive Control (BS-MPC), a novel approach that incorporates bisimulation metric loss in its objective function to directly optimize the encoder. This time-step-wise direct optimization enables the learned encoder to extract intrinsic information from the original state space while discarding irrelevant details and preventing the gradients and errors from diverging. BS-MPC improves training stability, robustness against input noise, and computational efficiency by reducing training time. We evaluate BS-MPC on both continuous control and image-based tasks from the DeepMind Control Suite,…
Peer Reviews
Decision·ICLR 2025 Poster
The paper provides a new perspective by integrating the bisimulation metric to address known challenges in MBRL, particularly around stability and robustness to noise. The experimental results demonstrate how BS-MPC performs well in both state-based and image-based tasks, showing increased resilience to noise and achieving faster training times due to parallel computation. The theoretical analysis adds depth by bounding cumulative rewards in the learned latent space, suggesting that BS-MPC retai
While the theoretical foundations are thorough, certain explanations, particularly on encoder stability and noise resilience, could be made clearer to broaden accessibility. The parameters require extensive tuning, which may be impractical for real-world applications lacking automated parameter selection. Additionally, the approach to introducing perturbations, particularly with visual distractions, doesn’t seem entirely effective. It would be beneficial to test perturbations that are more repre
The paper is well-written and easy to follow. The overall presentation is good. The approach is sound and makes sense to the reviewer. The experimental results look promising, compared to TD-MPC.
However, the major weakness is its novelty. 1. The whole framework is based on TD-MPC. The difference is the authors introduce the Bisimulation metric and its corresponding loss design, which are from the existing literature, as stated in the paper. 2. It is also a common way to introduce additional regularization loss terms for the encoder of model-based RL. 3. The theoretical analysis mainly borrows from the existing work and does not have any major significant result. It would be great if
The paper is clearly written and well presented. The proposed bisimulation metric seems to work well on the experiments considered, compared to TD-MPC and other baselines. The supplementary sections are comprehensive.
The novelty of the paper seems ambiguous. It seems that both on-policy bisimulation and TD-MPC methods are well studied for model based RL, and the authors plug bisimulation into TD-MPC. There are several typos in the paper. “In BS-MPC, the latent dynamics are modeled using an MLP. We also model the latent dynamics model with an MLP” I believe BS-MPC should be TD-MPC. “we sample M action sets from Gaussian distribution N (μ0, σ0) based on the initial meanμ0 and standard deviation σ0” Missing s
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Fault Detection and Control Systems · Control Systems and Identification
