The Curse of Diversity in Ensemble-Based Exploration

Zhixuan Lin; Pierluca D'Oro; Evgenii Nikishin; Aaron Courville

arXiv:2405.04342·cs.LG·May 8, 2024

The Curse of Diversity in Ensemble-Based Exploration

Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

PDF

Open Access 2 Repos 3 Reviews

TL;DR

Training diverse ensembles in deep reinforcement learning can unexpectedly harm individual performance due to data sharing issues, but representation learning methods like CERL can help mitigate this problem.

Contribution

This paper identifies the 'curse of diversity' in ensemble-based exploration and proposes a novel representation learning approach, CERL, to address it.

Findings

01

Ensemble diversity can impair individual agent performance.

02

Larger replay buffers or smaller ensembles do not reliably solve the issue.

03

Representation learning via CERL effectively counters the curse.

Abstract

We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

Presentation - The paper is extremely clear and excellently written. The problem, motivation, and experiments are articulated very clearly. I like that they look introspectively at their own experiments and reason clearly about what can be inferred/concluded from their experiments without making unreasonable intellectual leaps. Contributions: 1. They show that perhaps the aggregation/majority voting aspect of ensembling methods may contribute to improved performance more than previously attribu

Weaknesses

I do not have any many major qualms with the paper, but I'll list a few thoughts. Perhaps this is out of scope of the paper, but I do feel it's difficult to draw conclusions about deep RL more broadly without investigating distributional RL. For example, this paper (https://openreview.net/pdf?id=ryeUg0VFwr) shows that distributional RL will likely do better with this off-policy data. It would be interesting to investigate the extent of this phenomena in the distributional setting. CERL does s

Reviewer 02Rating 8· accept, good paperConfidence 5

Strengths

The reviewer liked the paper a lot. The main hypothesis makes sense and is substantiated in multiple experiments that show the effect nicely. The paper is well written and the figures are clearly readable. More detailed figures for individual environments are provided in the appendix, which is welcome to get an idea how trustworthy the aggregate performance measures are. The proposed method is not terribly innovative, but to the best knowledge of this reviewer novel. The discussion on other repr

Weaknesses

While the main paper is very well written and the experiments appear quite thorough, the reviewer took issue with the way that some conclusions were presented. In particular the connection to exploration (which is in the title) ignores some major alternative explanations of the results. While the reviewer recommends to accept the paper, some phrases *need* to be changed, and some discussion needs to be added, to prevent the casual reader from misinterpreting the text and results. These are: 1.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

I like the way this paper is presented. It is clearly motivated by empirical discoveries, together with reasonings, and followed by solutions to the identified problems. The authors made great efforts in conducting and presenting experiments. Results are reported in a statistically-identifiable way. I really appreciate it.

Weaknesses

### On high-level Motivation: I’m lost in the motivation of using ensemble in **policy learning**. As has been demonstrated in [EDAC] and [REDQ], I acknowledge that using ensemble learning for the **value function** could lead to improved performance, as the value can be more accurate, with uncertainty. But what is the motivation for having **multiple policies** for ensemble (because they are sample generators, rather than learners). Should not those samplers aim at more efficiently decreasing

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReservoir Engineering and Simulation Methods