Online Learning of Feasible Strategies in Unknown Environments

Santiago Paternain; Alejandro Ribeiro

arXiv:1604.02137·math.OC·April 8, 2016·IEEE Trans. Autom. Control.

Online Learning of Feasible Strategies in Unknown Environments

Santiago Paternain, Alejandro Ribeiro

PDF

Open Access

TL;DR

This paper introduces an online saddle point controller for environments with time-varying convex constraints and costs, enabling agents to learn feasible and near-optimal strategies with bounded or sublinear regret and constraint violations.

Contribution

It proposes a novel online control method using saddle point dynamics that guarantees bounded or sublinear growth of regret and constraint violations in unknown, dynamic environments.

Findings

01

Controller achieves bounded fit and regret.

02

Method effectively manages time-varying constraints and costs.

03

Numerical experiments validate the approach in a shepherding scenario.

Abstract

Define an environment as a set of convex constraint functions that vary arbitrarily over time and consider a cost function that is also convex and arbitrarily varying. Agents that operate in this environment intend to select actions that are feasible for all times while minimizing the cost's time average. Such action is said optimal and can be computed offline if the cost and the environment are known a priori. An online policy is one that depends causally on the cost and the environment. To compare online policies to the optimal offline action define the fit of a trajectory as a vector that integrates the constraint violations over time and its regret as the cost difference with the optimal action accumulated over time. Fit measures the extent to which an online policy succeeds in learning feasible actions while regret measures its success in learning optimal actions. This paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Advanced Bandit Algorithms Research · Extremum Seeking Control Systems