Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation
Yunhan Huang, Quanyan Zhu

TL;DR
This paper reveals that reinforcement learning agents in linear quadratic control systems are vulnerable to cost manipulation attacks, which can mislead the agent into dangerous policies with minimal falsification of cost signals.
Contribution
It introduces a convex optimization-based attack model for falsifying cost signals and demonstrates its effectiveness on different LQG learners, highlighting security vulnerabilities.
Findings
Small cost falsifications can significantly alter policies.
Attacks can mislead learners into dangerous policies.
Minimal falsification (around 2.3%) is sufficient for successful attack.
Abstract
In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals. We show that a small falsification of the cost parameters will only lead to a bounded change in the optimal policy. The bound is linear on the amount of falsification the attacker can apply to the cost parameters. We propose an attack model where the attacker aims to mislead the agent into learning a `nefarious' policy by intentionally falsifying the cost parameters. We formulate the attack's problem as a convex optimization problem and develop necessary and sufficient conditions to check the achievability of the attacker's goal. We showcase the adversarial manipulation on two types of LQG learners: the batch RL learner and the other is the adaptive dynamic programming (ADP) learner. Our results demonstrate that with only 2.296% of falsification on the cost data, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Smart Grid Security and Resilience · Adaptive Dynamic Programming Control
