Towards Sustainable Investment Policies Informed by Opponent Shaping

Juan Agustin Duque; Razvan Ciuca; Ayoub Echchahed; Hugo Larochelle; Aaron Courville

arXiv:2602.11829·cs.LG·February 13, 2026

Towards Sustainable Investment Policies Informed by Opponent Shaping

Juan Agustin Duque, Razvan Ciuca, Ayoub Echchahed, Hugo Larochelle, Aaron Courville

PDF

Open Access 3 Reviews

TL;DR

This paper formalizes the conditions under which multi-agent simulations of climate-related investment decisions exhibit social dilemmas and demonstrates how opponent shaping algorithms can promote sustainable, cooperative behaviors among economic agents.

Contribution

It introduces a formal analysis of social dilemmas in InvestESG and applies Advantage Alignment to steer agent learning toward socially optimal outcomes.

Findings

01

Advantage Alignment biases learning toward cooperation

02

Strategic opponent shaping improves sustainability outcomes

03

Theoretical thresholds identify when social dilemmas occur

Abstract

Addressing climate change requires global coordination, yet rational economic actors often prioritize immediate gains over collective welfare, resulting in social dilemmas. InvestESG is a recently proposed multi-agent simulation that captures the dynamic interplay between investors and companies under climate risk. We provide a formal characterization of the conditions under which InvestESG exhibits an intertemporal social dilemma, deriving theoretical thresholds at which individual incentives diverge from collective welfare. Building on this, we apply Advantage Alignment, a scalable opponent shaping algorithm shown to be effective in general-sum games, to influence agent learning in InvestESG. We offer theoretical insights into why Advantage Alignment systematically favors socially beneficial equilibria by biasing learning dynamics toward cooperative outcomes. Our results demonstrate…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- Originality: The paper introduces a novel MARL environment that highlights a critical link in broader climate-economic space, namely impact investing and greenwashing risks. Furthermore, the authors make use of SOTA learning algorithms. - Quality: The authors are theoretically rigorous in their analysis and include an appendix proving that InvestESG is a social dillema for certain values of lambda along with ablation studies. - Significance: Significance largely lies in highlighting the sc

Weaknesses

-The real world impact is overstated. the problem with building international climate agreements is that cooperation is difficult to achieve, using an algorithm that is biased towards cooperative policies doesn’t really capture that phenomena. Missing some connection to actual climate-economic literature about the dynamics of impact investing. -Theoretical results only valid under strong, unrealistic assumptions. -Interesting components (greenwashing, resilience investments) of environment dis

Reviewer 02Rating 4Confidence 4

Strengths

- The paper's primary strength lies in its formal analysis of the InvestESG simulation. Instead of taking the environment at face value, the authors mathematically derive the precise conditions under which it functions as a true social dilemma. - The successful application of Advantage Alignment to this complex high-dimensional economic simulation. It demonstrates a scalable method for finding cooperative high-welfare solutions where standard MARL baselines fail.

Weaknesses

- The paper justifies excluding other OS methods like LOLA or BRS on the grounds of scalability. While reasonable, this means AA is only compared against non-shaping methods (IPPO/MAPPO). It is unclear if AA superior because it's an OS method, or because it's a better OS method. Comparing AA to at least one other OS method on a scaled-down version of the $\alpha$-InvestESG environment would be helpful to make a stronger claim.

Reviewer 03Rating 6Confidence 3

Strengths

1. **Clear analysis and diagnosis of InvestESG benchmark**: The paper formalizes the conditions when InvestESG is a dilemma. Then it validates the predicted 𝛼 threshold empirically: the single-firm/investor sweep exhibiting a sharp change near 𝛼≈30 and full game behavior at 𝛼=70. The proof is rigorous and the empirical implementation is well-executed. This is very useful to the community who would like to utilize this benchmark for policy analysis and policy making. 1. **Comparison between AdAli

Weaknesses

**Limited novelty**: The paper focuses on one *existing* benchmark (with some modifications) for deep analysis, and applies an *existing* opponent shaping algorithm on this benchmark. However, the results of original AdAlign paper show that this method has proved to be effective in maximizing social welfare compared to PPO baseline on other benchmarks. Applying the same method to a variation of InvestESG might not considered to be a *fundamental* innovation. **Asymmetry in comparisons**: The pa

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Time Series Analysis · Game Theory and Applications · Experimental Behavioral Economics Studies