Information Revelation and Alignment Faking in Stochastic Differential Games
Daniel Ralston, Xu Yang, Ruimeng Hu

TL;DR
This paper investigates how players in stochastic differential games reveal or fake information about hidden parameters, proposing a control framework that balances information gain and detectability, with practical implications for strategic interactions.
Contribution
It introduces a novel alignment-faking control problem in stochastic differential games, incorporating proxy Fisher information and coupled Riccati equations for tractable analysis.
Findings
Alignment faking can significantly increase information gain when the model is accurate.
Faking strategies may lead to higher detectability of deception.
Proxy Fisher information may misestimate true information under model misspecification.
Abstract
In competitive games with private objectives, actions can reveal information about hidden parameters. Quantifying such information revelation, however, is substantially more challenging, since it depends not only on the opponent's hidden parameter but also on the opponent's model of the game. We study this problem via a two-player linear-quadratic stochastic differential game under partial information, in which each player knows its own coupling parameter and models the opponent's hidden parameter through a prior. Starting from the full-information game, we characterize the Nash equilibrium by coupled Riccati equations. We then define baseline implementable controls by averaging the equilibrium under each player's prior. Building on this baseline, we formulate an alignment-faking control problem in which one player trades off fidelity to its implementable policy against information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Adaptive Dynamic Programming Control · Reinforcement Learning in Robotics
