Mind the Model, Not the Agent: The Primacy Bias in Model-based RL
Zhongjian Qiao, Jiafei Lyu, Xiu Li

TL;DR
This paper investigates the primacy bias in model-based reinforcement learning (MBRL), revealing that resetting the world model, rather than the agent, effectively alleviates this bias and enhances performance across various control tasks.
Contribution
The paper introduces world model resetting as a novel technique to reduce primacy bias in MBRL, demonstrating its effectiveness on multiple algorithms and benchmarks.
Findings
World model resetting significantly improves MBRL performance.
Primacy bias in MBRL is linked to the world model, not the agent.
The method is effective on continuous and discrete control tasks.
Abstract
The primacy bias in model-free reinforcement learning (MFRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of MFRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias in MFRL. However, the primacy bias in model-based reinforcement learning (MBRL) remains unexplored. In this work, we focus on investigating the primacy bias in MBRL. We begin by observing that resetting the agent's parameters harms its performance in the context of MBRL. We further find that the primacy bias in MBRL is more closely related to the primacy bias of the world model instead of the primacy bias of the agent. Based on this finding, we propose \textit{world model resetting}, a simple yet effective technique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCRISPR and Genetic Engineering · Viral Infectious Diseases and Gene Expression in Insects
MethodsFocus
