Minimax optimal dual control -- The single input case
Anders Rantzer

TL;DR
This paper derives an explicit solution for the Bellman inequality in minimax optimal dual control, describing a policy that balances exploration and exploitation in linear systems.
Contribution
It provides a novel explicit solution for the dual control problem in the single input case, including a randomized policy for insufficient data scenarios.
Findings
Optimal policy transitions to certainty equivalence after enough data is collected.
The policy incorporates randomization to enhance system excitation when data is limited.
Explicit solution for the Bellman inequality in the minimax dual control context.
Abstract
An explicit solution is derived for the Bellman inequality corresponding to minimax optimal dual control. The minimizing player determines control action as a function of past state measurements and inputs. The maximizing player selects disturbances and model parameters for the underlying linear time-invariant dynamics. The optimal minimizing policy is a dual controller that optimizes the tradeoff between exploration and exploitation. Once sufficient data has been collected, the policy becomes a deterministic certainty equivalence controller. However, when data is insufficient, the policy introduces a randomized term to improve excitation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
