Training Classifiers For Feedback Control

Hasan A. Poonawala; Niklas Lauffer; Ufuk Topcu

arXiv:1903.03688·math.OC·March 12, 2019·ACC

Training Classifiers For Feedback Control

Hasan A. Poonawala, Niklas Lauffer, Ufuk Topcu

PDF

TL;DR

This paper proposes a method for training classifiers for feedback control systems that use high-dimensional sensor data, focusing on stability and performance, demonstrated through a navigation case study.

Contribution

It introduces a control-theoretic training approach for classifiers in feedback control, ensuring stability and performance without explicit state estimation.

Findings

01

Effective classifier training using projected gradient descent.

02

Improved stability in feedback control with learned classifiers.

03

Successful application to sensor-based navigation.

Abstract

One approach for feedback control using high dimensional and rich sensor measurements is to classify the measurement into one out of a finite set of situations, each situation corresponding to a (known) control action. This approach computes a control action without estimating the state. Such classifiers are typically learned from a finite amount of data using supervised machine learning algorithms. We model the closed-loop system resulting from control with feedback from classifier outputs as a piece-wise affine differential inclusion. We show how to train a linear classifier based on performance measures related to learning from data and the local stability properties of the resulting closed-loop system. The training method is based on the projected gradient descent algorithm. We demonstrate the advantage of training classifiers using control-theoretic properties on a case study…

Equations79

w \in R^{s} min l_{d a t a} (w),

w \in R^{s} min l_{d a t a} (w),

w_{k + 1} = w_{k} - α_{k} \nabla l_{d a t a} (w)^{T}

w_{k + 1} = w_{k} - α_{k} \nabla l_{d a t a} (w)^{T}

w \in W min l_{d a t a} (w) + l_{co n t r o l} (w)

w \in W min l_{d a t a} (w) + l_{co n t r o l} (w)

w_{k + 1}^{'} =

w_{k + 1}^{'} =

w_{k + 1} =

\overset{x}{˙} (t) \in A (x (t)),

\overset{x}{˙} (t) \in A (x (t)),

C (y) = b_{i} if \overset{ˉ}{E}_{i} (w) y + \overset{e}{ˉ}_{i} (w) > 0,

C (y) = b_{i} if \overset{ˉ}{E}_{i} (w) y + \overset{e}{ˉ}_{i} (w) > 0,

C (y) = {b_{1} b_{2} if w_{1}^{T} y + w_{0} > 0 if w_{1}^{T} y + w_{0} < 0,

C (y) = {b_{1} b_{2} if w_{1}^{T} y + w_{0} > 0 if w_{1}^{T} y + w_{0} < 0,

u (t) = C (y (t)) .

u (t) = C (y (t)) .

A_{i} = co ({A_{ik} x + a_{ik}}_{k \in I_{A_{i}}}),

A_{i} = co ({A_{ik} x + a_{ik}}_{k \in I_{A_{i}}}),

u (t) = C (H (x (t))) .

u (t) = C (H (x (t))) .

y = \hat{H} (x) = \hat{H} (x_{e}) + \frac{\partial H ^}{\partial x} (x - x_{e}) + O ((x - x_{e})^{2}) \approx H x + h .

y = \hat{H} (x) = \hat{H} (x_{e}) + \frac{\partial H ^}{\partial x} (x - x_{e}) + O ((x - x_{e})^{2}) \approx H x + h .

\overset{ˉ}{E}_{i} (w) H (x) + \overset{e}{ˉ}_{i} (w) \geq 0

\overset{ˉ}{E}_{i} (w) H (x) + \overset{e}{ˉ}_{i} (w) \geq 0

\approx

=

\overset{x}{˙} \in A_{i} if E_{i} (w) x + e_{i} (w) \geq 0,

\overset{x}{˙} \in A_{i} if E_{i} (w) x + e_{i} (w) \geq 0,

Ω_{P} = {A_{i}}_{i \in I_{P}}

Ω_{P} = {A_{i}}_{i \in I_{P}}

\overset{x}{˙} (t) = co (A_{i}), if x_{i} (t) \in X_{i} .

\overset{x}{˙} (t) = co (A_{i}), if x_{i} (t) \in X_{i} .

X_{i} = {x \in R^{n} : E_{i} x + e_{i} \geq 0} .

X_{i} = {x \in R^{n} : E_{i} x + e_{i} \geq 0} .

p_{i} = F_{i}^{T} μ_{i},

p_{i} = F_{i}^{T} μ_{i},

μ_{i} \geq 1,

[E_{i} 0 e_{i} 1]^{T} v_{ij k} = - [A_{ik}^{T} a_{ik}^{T}] p_{j},

v_{ij k} \geq 1,

p_{i} - p_{j} = λ_{ij} f_{ij},

λ_{ij} \geq 1,

w, p_{i}, u_{i}, v_{ij k}, λ_{ij} min

w, p_{i}, u_{i}, v_{ij k}, λ_{ij} min

p_{i} = F_{i}^{T} μ_{i},

p_{i} = F_{i}^{T} μ_{i},

μ_{i} \geq 1,

\IEEEeqnarraymulticol 2 l q_{ij k} = [E_{i} (w) 0 e_{i} (w) 1]^{T} v_{ij k} + [A_{ik}^{T} a_{ik}^{T}] p_{j},

v_{ij k} \geq 1,

p_{i} - p_{j} = λ_{ij} f_{ij},

λ_{ij} \geq 1,

f (x, u_{1}) = [\frac{v ^{*} ρ c o s ( ψ )}{1 - ρ d} v^{*} sin (ψ)], f (x, u_{2}) = [ω^{*} 0], and f (x, u_{3}) = [- ω^{*} 0],

f (x, u_{1}) = [\frac{v ^{*} ρ c o s ( ψ )}{1 - ρ d} v^{*} sin (ψ)], f (x, u_{2}) = [ω^{*} 0], and f (x, u_{3}) = [- ω^{*} 0],

A_{1} = co_{ρ \in P} ([0 v^{*} - ρ v^{*} 0] x + [v^{*} ρ 0]),

A_{1} = co_{ρ \in P} ([0 v^{*} - ρ v^{*} 0] x + [v^{*} ρ 0]),

A_{2} = f (x, u_{2}) and A_{3} = f (x, u_{3}) .

A_{2} = f (x, u_{2}) and A_{3} = f (x, u_{3}) .

l_{d a t a} (w) = ∥ w ∥_{2} + γ k = 1 \sum N_{D} max (0, 1 - b^{k} y^{k}),

l_{d a t a} (w) = ∥ w ∥_{2} + γ k = 1 \sum N_{D} max (0, 1 - b^{k} y^{k}),

C (y) = ⎩ ⎨ ⎧ u_{2} u_{3} u_{1} if (w_{1}^{12})^{T} y + w_{0}^{12} < 0, (w_{1}^{23})^{T} y + w_{0}^{23} > 0, if (w_{1}^{13})^{T} y + w_{0}^{13} < 0, (w_{1}^{23})^{T} y + w_{0}^{23} < 0, otherwise .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Training Classifiers For Feedback Control

Hasan A. Poonawala, Niklas Lauffer, and Ufuk Topcu This material is based upon work supported by the National Science Foundation under Grant No. 1646522 and Grant No. 1652113. Hasan A. Poonawala is with the Department of Mechanical Engineering, University of Kentucky, Lexington, KY 40506, USA. [email protected] Lauffer is with the University of Texas, Austin, TX 78712, USA. [email protected] Topcu is with the Department of Aerospace Engineering, University of Texas, Austin, TX 78712, USA. [email protected]

Abstract

One approach for feedback control using high dimensional and rich sensor measurements is to classify the measurement into one out of a finite set of situations, each situation corresponding to a (known) control action. This approach computes a control action without estimating the state. Such classifiers are typically learned from a finite amount of data using supervised machine learning algorithms. We model the closed-loop system resulting from control with feedback from classifier outputs as a piece-wise affine differential inclusion. We show how to train a linear classifier based on performance measures related to learning from data and the local stability properties of the resulting closed-loop system. The training method is based on the projected gradient descent algorithm. We demonstrate the advantage of training classifiers using control-theoretic properties on a case study involving navigation using range-based sensors.

I Introduction

A common situation in robotics involves using information-rich sensors, which provide high dimensional measurements, to control the state of a robot in different environments. Example of such sensors include cameras and LIDAR. Even though the available measurement is high-dimensional, the robot may often only need to identify the current situation it is in and apply a corresponding control, without explicit knowledge of the state. Obstacle avoidance using proximity sensors such as SONAR are an example of this strategy. A finite set of controls is often sufficient to achieve safe and stable operation of the robot in that environment, where each control in the set corresponds to one of the specific situations that is known to occur.

A classifier, trained using supervised learning methods, often performs the identification step. Once the measurement has been classified into one of the finite possible situations, the system uses a pre-designed control action associated with the classifier output. In many robotic systems such as for mobile robots, human expertise is sufficient to design these control actions. We refer to such a feedback control system as a classifier-in-the-loop system. Figure 1 depicts such a feedback mechanism. Several feedback systems in the literature are classifier-in-the-loop systems [1, 2], however the evaluation of their properties are almost always empirical. We seek to provide a more rigorous approach to the analysis and synthesis of classifiers used for control purposes.

Given training data, one can use supervised learning methods [3, 4] to design a classifier that assigns one of the finite possible controls to a measurement. A common approach to supervised learning involves solution of an optimization problem. The objective function typically consists of a loss function that penalizes errors between the classifier’s prediction for a measurement and the actual target value associated with that measurement in the dataset.

A low value of the loss function does not necessarily say anything about the properties of the resulting closed-loop system. We require a method to relate the closed-loop system properties with the parameters of the classifier. We wish to reformulate existing techniques for training classifiers in a way that is meaningful for their use as feedback controllers.

An important observation that permits development of the training methods we will present involves the recognition that a classifier-in-the-loop control scheme can be modeled [5] using switched [6] and/or hybrid system formalisms [7]. The classifier parameters dictate the switching (or guard) surfaces of the closed-loop system. Training the classifier is equivalent to determining the appropriate switching surface. Analysis of switched systems with variable switching surfaces is central to training of classifier-in-the-loop systems. Some methods exist to analyze or design such hybrid systems [8, 9, 10, 11, 12, 13]. We will use methods from [11, 12, 13].

Contributions

This work involves three contributions. First, we show how to model the control of dynamical systems via classification using piece-wise affine differential inclusions [12]. Second, we formulate the training problem for classifiers used in control as a constrained optimization problem, and derive the corresponding constraints using Lyapunov-based stability conditions appropriate for piece-wise affine differential inclusions [12, 11]. These constraints are bilinear in the optimization variables. Our third contribution is to develop an algorithm for solving the constrained optimization problem based on the projected gradient descent algorithm.

The work in this paper differs from [5] in that here we propose computational algorithms to design classifier-in-the-loop systems. We apply our training method for classification-based feedback control to a robot navigation problem simulated in ROS Gazebo.

II Control-Oriented Training Of Classifiers

Consider a classifier $C$ parametrized by a set of weights $w\in\mathbb{R}^{s}$ for some $s\in\mathbb{N}$ . Training a classifier typically involves the solution of an optimization problem in the form

[TABLE]

where $l_{data}\colon\mathbb{R}^{s}\to\mathbb{R}$ is a loss function for $w$ evaluated on the data set $D$ .

Assuming that we can compute a gradient of $l_{data}(w)$ , an iterative algorithm to compute a local optimal point $w^{*}$ is given by

[TABLE]

where $\nabla l_{data}(w)$ is the gradient, $\alpha_{k}$ is the learning rate, and $w_{k}$ is the estimate of $w^{*}$ at the $k^{\mathrm{th}}$ iteration.

We modify (1) to train classifiers that will be used for control in two ways. We add a term $l_{control}(w)$ to the objective function. We also add constraints on $w$ such that all feasible solutions of the optimization problem correspond to closed-loop systems with desired behavior. Let $\mathcal{W}$ be the feasible set under these constraints. The resulting optimization problem that trains classifiers for control is

[TABLE]

Given the constraints defining $\mathcal{W}$ , we can compute an optimal solution $w^{*}$ using projected gradient descent. This procedure involves solution of the iterative equations

[TABLE]

Figure 2 depicts the procedure.

The key problems considered in this paper are three-fold. First, how do we robustly model the closed-loop system derived from a control scheme involving classification? Second, how do we derive the constraints on the classifier parameters that, if satisfied, imply desirable closed-loop properties of the modeled system? Third, given these constraints (equivalently, the constraint set $\mathcal{W}$ ), how do solve the optimization problem in (5)?

The next three sections provide a solution to each of these problems. Briefly, we propose the use of piece-wise affine differential inclusions to model the closed-loop system, the use of piece-wise linear Lyapunov functions to certify closed-loop properties, and algorithms for solving biconvex optimization problems to solve (5). These solutions, together, allow us to train classifiers that yield control performance guarantees on the closed-loop system behavior. Note that some of the choices we make to solve each problem are important for being able to combine the three solutions.

III Classifier-in-the-Loop Systems

In this section we show how to model the dynamics of a classifier-in-the loop system as a piece-wise affine differential inclusion [12] of the form

[TABLE]

where $x\in\mathbb{R}^{n}$ and $\mathcal{A}\colon\mathbb{R}^{n}\to 2^{\mathbb{R}^{n}}$ is a set-valued map constructed using affine functions.

III-A How Do We Model Classifiers For Control?

In general, a classifier $C\colon Y\to L$ is a map that assigns a unique label $b\in L$ to an measurement $y\in Y$ , where $Y\subseteq\mathbb{R}^{m}$ is the space of measurements. The set $L$ is typically finite. Several methods are available to construct a classifier $C$ [3]. We focus our attention on classifiers that partition the measurement space into polytopes of identically classified points. This class of classifiers is fairly large, and includes linear classifiers, rectifier networks (common in deep learning), decision trees, and nearest-neighbor classifiers.

Let $w$ denote the parameters of a classifier $C$ . We refer to the procedure for obtaining $w$ using data as training the classifier. We represent the classifier $C$ as

[TABLE]

where $i\in I$ , $I$ is the index set of convex polytopes in the partition induced by the classifier. Note that non-convex polytopes are easily divided into convex polytopes.

When $L=2$ , the classifier (7) becomes

[TABLE]

where $w=(w_{1},w_{0})\in\mathbb{R}^{m+1}$ are learned from data. In this case, the classifier parameters $w$ in (8) directly yield the partition (equivalently, parameters $\bar{E}_{i}(w)$ and $\bar{e}_{i}(w)$ ) of $Y$ . For nearest-neighbor classifiers or decision trees, the parameters $\bar{E}_{i}(w)$ and $\bar{e}_{i}(w)$ will need to be derived from the trained classifier parameters $w$ .

When $\lvert L\rvert>2$ , one often constructs a classifier $C\colon Y\to L$ by combining multiple binary classifiers (8) in different ways. One way is to construct a decision tree, where every node is a binary classifier. Another way is to train multiple classifiers parameterized as $w^{j}=(w_{1}^{j},w_{0}^{j})$ , where each classifier $w^{j}$ distinguishes between one of the $\binom{\lvert L\rvert}{2}$ possible pairs of labels from $L$ . The partitions induced by this approach can be derived from $w^{j}$ . This multi-label classification scheme is known as one-vs-one classification. Instead, we can train $\lvert L\rvert$ classifiers that separate each label from all other labels. This classification scheme is known as one-vs-all classification.

III-B What Data Are Required?

The set $D$ of training data generally consists of $N_{D}$ triples $(x^{k},y^{k},b^{k})$ where $x$ is the state of the robot, $y$ is the measurement obtained in that state, $b$ is the class label associated with the measurement, and $k$ denotes the index of the triple in the dataset.

We assume that some labels $b\in L$ correspond to known control actions $u_{i}\in U$ that should be taken when the corresponding measurement is observed, according to the data. The classifier predicts labels for each measurement, effectively predicting a control action for each measurement.

To analyze the classifier, we also learn an approximate map $y=\hat{\mathcal{H}}(x)$ . The approximation $\hat{\mathcal{H}}$ may be used for all states $x\in X$ , or may be a local approximation using data corresponding to some neighborhood.

III-C How Do We Model The Control?

When we associate a specific control $u_{i}\in U$ action to each label $b_{i}\in L$ , the classifier $C$ effectively assigns a control $u_{i}\in U$ to a measurement, where $i\in\{1,2,\dots,\lvert U\rvert\}$ (see Figure 1). That is,

[TABLE]

Recall that $U$ is finite and $u(t)$ switches in time between its elements. Note that we obtain the control input from the measurement, not the state.

We approximate each vector field $f(x,u_{i})$ using an affine differential inclusion $\mathcal{A}_{i}$ , given by

[TABLE]

where $co(\cdot)$ is the convex hull operation, and $I_{\mathcal{A}_{i}}$ is the (finite) index set of the affine vector fields that define $\mathcal{A}_{i}$ . For $\mathcal{A}_{i}$ to be a valid over-approximation of $f(x,u_{i})$ over some set $S\subseteq\mathbb{R}^{n}$ , we require that $\cup_{x\in S}f(x,u_{i})\subseteq\mathcal{A}_{i}$ .

III-D How Do We Derive the Closed-Loop Dynamics?

To derive a closed-loop model of the form (6), we must be able to model which labels may be assigned in a state $x$ . We assume that there is a continuous functional relationship between the measurement and the state, at least locally in space and time, so that $y=\mathcal{H}(x)$ . This assumption is reasonable for robots operating in slowly changing environments. We therefore model (9) as

[TABLE]

We will use an approximation $\hat{\mathcal{H}}\colon X\to Y$ of the map $\mathcal{H}$ , learned from data, to define the dynamics in a state $x$ .

A key idea of our work is that classifier $C$ partitions the measurement space $Y$ , so that control (11) is effectively a state-based switching control. The dynamics switches between the vector fields $f(x,u_{i})$ , where $i\in\{1,\dots,\lvert U\rvert\}$ , which we approximate by the inclusions $\mathcal{A}_{i}$ . From equation (7), the partitions of $\mathbb{R}^{n}$ induced by the partitions of $Y$ are given by inequalities of the form $\bar{E}_{i}(w)\hat{\mathcal{H}}(x)+\bar{e}_{i}(w)>0$ . These inequalities may define cells with nonlinear boundaries.

To model the closed-loop system in a way that is robust to the uncertainty in the switching surfaces, we define convex polytopic partitions where the dynamics may be a combination of the differential inclusions $\mathcal{A}_{i}$ that approximate the vector fields $f(x,u_{i})$ . Figure 3 depicts an example of this process. Figure 3a shows the case when the map $\mathcal{H}$ is nonlinear, so that the switching surface is not planar.

There are two approaches to obtaining convex partitions. The first one involves linearization of $\hat{\mathcal{H}}$ followed by over-approximation, as in Figures 3c and 3d. The linear estimate in a neighborhood of a point $x_{e}$ is given by

[TABLE]

We locally approximate the partitions by the inequalities

[TABLE]

The set $\mathcal{A}_{3}$ in Figures 3b and 3d is given by $\mathcal{A}_{3}=co(\mathcal{A}_{1}\cup\mathcal{A}_{2})$ . The advantage of this approach is that when using binary classifiers of the form (8), the partition parameters $E_{i}(w)$ and $e_{i}(w)$ are linear in $w$ .

A second approach is to over-approximate the nonlinear boundary in Figure 3a using polyhedral sets, as in Figure 3b. While this approach is intuitively more appealing than one involving linearization, it is harder to express the partition parameters $E_{i}(w)$ and $e_{i}(w)$ as linear functions of $w$ .

By combining the classifier representation (7), the differential inclusion dynamics (10), and the approximations (13), we model the classifier-in-the-loop system as

[TABLE]

which is a differential inclusion of the form $\dot{x}\in\mathcal{A}(x)$ .

III-E Is This Model Robust To Uncertainties?

The use of over-approximations inherently provides robustness to uncertainties and modeling errors. The caveat to this approach is that the over-approximation may exhibit far too many trajectories. Some of these trajectories may not satisfy the closed-loop properties we wish to certify for the system, while a tighter over-approximation may satisfy the control properties. Procedures to obtain such tight over-approximations are beyond the scope of this paper.

IV Control-Oriented Constraints On Classifier Parameters

Sufficient conditions for certifying the closed-loop properties of (14) become conditions on the classifier parameters $w$ , that is, they define $\mathcal{W}$ in (5). The properties of interest to us include practical asymptotic stability (ultimate boundedness), forward set invariance, and asymptotic stability. These properties physically correspond to low set-point or tracking errors, or to safety via boundedness of the state. We want these properties to hold for all closed-loop trajectories.

Our approach follows much of the existing work on analysis of piece-wise affine differential inclusions [12, 13, 11]. These papers derive sufficient conditions under different assumptions and parameterizations of the certificates (typically Lyapunov functions) for control properties. We present the conditions in terms of our chosen parameterization below. We choose polyhedral Lyapunov functions as motivated by [13], the stability conditions come from results in [12], and methods to remove quantifiers from these conditions are inspired by [11].

A partition $\mathcal{P}$ in $\mathbb{R}^{n}$ is a collection of subsets $\{X_{i}\}_{i\in I_{\mathcal{P}}}$ ; where $I_{\mathcal{P}}$ is an index set, $n\in\mathbb{N}$ , $X_{i}\subseteq\mathbb{R}^{n}$ for each $i\in I_{\mathcal{P}}$ , and $Int(X_{i})\cap Int(X_{j})=\emptyset$ for each pair $i,j\in I_{\mathcal{P}}$ such that $i\neq j$ . We define the domain $Dom(\mathcal{P})$ of the partition as $Dom(\mathcal{P})=\cup_{i\in I_{\mathcal{P}}}X_{i}$ . We also refer to the subsets $X_{i}$ in $\mathcal{P}$ as the cells of the partition. Note that this definition allows some cells in $\mathcal{P}$ to represent the boundary between other cells in $\mathcal{P}$ , which is useful for handling sliding modes.

A piece-wise affine dynamical system $\Omega_{\mathcal{P}}$ associated with partition $\mathcal{P}=\{X_{j}\}_{j\in I_{\mathcal{P}}}$ is a collection,

[TABLE]

that to each cell $X_{i}\in\mathcal{P}$ assigns the affine differential inclusion $\mathcal{A}_{i}=\{A_{ik}x+a_{ik}\}_{k\in I_{\mathcal{A}_{i}}}$ .

[TABLE]

The cell $X_{i}\in\mathcal{P}$ is given by

[TABLE]

We parameterize a continuous polyhedral Lyapunov function $V_{\mathcal{Q}}(x)$ with a partition $\mathcal{Q}=\{Z_{j}\}_{j\in I_{\mathcal{Q}}}$ and a collection of vectors $\{p_{i}\}_{i\in I_{\mathcal{Q}}}$ such that $V_{\mathcal{Q}}(x)=p_{i}^{T}x,\textrm{ if }x\in Z_{i}\subseteq\mathbb{R}^{n}$ . Each set $Z_{j}\in\mathcal{Q}$ is given by $Z_{j}=\{x\in\mathbb{R}^{n}\colon{F_{j}x\geq 0}\}$ , and we assume that $Z_{j}$ is pointed at the origin.

We define index sets that denote the relationship between the system $\Omega_{\mathcal{P}}$ and the Lyapunov function $V_{\mathcal{Q}}$ . Let $I_{cont}\subseteq I_{\mathcal{Q}}\times I_{\mathcal{Q}}$ be the set of pairs of indices such that $Z_{i}\cap Z_{j}\neq\emptyset$ . Let $I_{dec}$ be the set of all triples $(i,j,k)$ such that $i\in I_{\mathcal{P}}$ , $k\in I_{\mathcal{A}_{i}}$ , $j\in I_{\mathcal{Q}}$ , and $X_{i}\cap Z_{j}\neq\emptyset$ .

Sufficient conditions on a piece-wise differential inclusion and candidate Lyapunov function that certify the existence of the control properties under consideration are given in [12]. The result below formally states these conditions in terms of our parametrization and notation.

Lemma 1.

Let $\Omega_{\mathcal{P}}$ be a piece-wise affine dynamical system and $V_{\mathcal{Q}}$ be a candidate Lyapunov function. Let $I_{\mathcal{P}}$ , $I_{\mathcal{Q}}$ , $I_{cont}$ , and $I_{dec}$ be the index sets associated with $\Omega_{\mathcal{P}}$ and $V_{\mathcal{Q}}$ . Let $Dom(\mathcal{P})$ be connected, and let $co(Dom(\mathcal{P}))$ contain the origin. Let $S_{max}$ and $S_{min}$ be the largest and smallest level set of $V_{\mathcal{Q}}(x)$ in $Dom(\mathcal{P})$ .

If the set of constraints

[TABLE]

is feasible, then

$S_{max}$ * is invariant* 2. 2.

$S_{min}$ * is ultimately bounded*

Furthermore, if $0\in Dom(\mathcal{P})$ , then the origin of $\Omega_{\mathcal{P}}$ is asymptotically stable with region of attraction $S_{max}$ .

Proof.

See the appendix. ∎

V Control-Oriented Training Using Projected Gradient Descent

In this section, we present an algorithm to solve (5), given a representation of (a subset of) the set $\mathcal{W}$ in terms of (18)-(23). The constraints (18)-(23) may be infeasible, since the candidate Lyapunov function (which always exists for suitable partition $\mathcal{Q}$ ) may not decrease along the dynamics of $\Omega_{\mathcal{P}}$ . To remedy this issue of infeasibility, we relax constraint (20). To account for this relaxation, we modify the objective function.

The following optimization problem implements (5):

[TABLE]

where $q_{ijk}$ serve as slack variables that relax the equality constraint (20), and $\beta\in\mathbb{R},\beta>0$ is a weighting factor. The optimization variables include $w$ , $p_{j}\ \forall j\in I_{\mathcal{Q}}$ , $v_{ijk}\ \forall(i,j,k)\in I_{dec}$ , and $u_{i}\ \forall i\in I_{\mathcal{Q}}$ . The optimization problem (24)-(30) has the following property.

Proposition 2.

Let $\Omega_{\mathcal{P}}$ be a piece-wise affine differential inclusion and $V_{\mathcal{Q}}(x)$ be a candidate polyhedral Lyapunov function. The optimization problem (24)-(30) is feasible.

Proof.

The partition $\mathcal{Q}$ allows selection of vectors $\{p_{j}\}_{j\in I_{\mathcal{Q}}}$ such that $V_{\mathcal{Q}}(x)$ is continuous and $V_{\mathcal{Q}}(x)>0$ for all $x\neq 0$ by construction. This property of $V_{\mathcal{Q}}(x)$ implies that constraints (25), (26), (29), and (30) are feasible. Since $q_{ijk}$ is unconstrainted, (27) and (28) are feasible, independent of the values of the remaining optimization variables. ∎

The constraints (25)-(30) are bilinear in the variables of the optimization problem. Optimization problems with bilinear constraints are typically NP-hard. We use a variant of Alternate Convex Search (ACS) [14] to solve this optimization problem, given in Algorithm 1.

We begin with the classifier parameters $w^{\prime}(k+1)$ obtained after a gradient descent step (4). We alternate between solving two convex optimization problems obtained by fixing a different subset of the variables in (24)-(30). We use a value of $\beta\ll 1$ to obtain solutions where the slack variables are zero. The convexity and feasibility (Proposition 2) of these problems imply that solutions exist at every iteration.

The first step of an iteration fixes $w$ , and obtain a solution to (24)-(30) denoted by $(p_{j}^{*},u_{i}^{*},v_{ijk}^{*},\lambda_{ij}^{*})$ . We then solve (24)-(30) again, however $w$ is now a variable, and variables $p_{j}$ remain variable. The variables $u_{i}$ , $v_{ijk}$ , and $\lambda_{ij}$ are fixed to the corresponding values $u_{i}^{*}$ , $v_{ijk}^{*}$ , and $\lambda_{ij}^{*}$ . The optimal solution is $(p_{j}^{**},w^{*})$ , and $w^{*}$ is used as the fixed value of $w$ in the next iteration of this procedure.

Note that the set $I_{dec}$ may change as the partition $\mathcal{P}$ changes. One way to avoid needing to recompute $I_{dec}$ at every iteration of the Alternate Convex Search is to set $\mathcal{P}=\mathcal{Q}$ . This approach is implicitly taken in [11, 13].

VI Case Study: Path Following

In our case study, we task a quadrotor equipped with an infra-red-based-scanning device to navigate a canyon-like terrain. We use the Gazebo robot simulation environment (see Figure 4), running on the Robot Operating System, to simulate this scenario. We demonstrate that the use of control constraints while training classifiers safeguards against unstable behavior.

VI-A Modeling

We model the quadrotor kinematics as a differential-drive like mobile robot. That is, we command the quadrotor to achieve has a forward velocity $v$ and an angular velocity $\omega$ . The simulated quadrotor, however, possesses full inertial and rotational dynamics and implements lower-level controllers to track the commanded velocities, allowing us to abstract away those full dynamics.

The corridor/canyon defines a path in the plane. We can attach a moving coordinate frame, known as a Frenet-Serret frame, to this path, and express the dynamics of the robot within this frame (see Figure 5). The configuration of the agent in the Frenet-Serret frame is $x=(\psi,d)$ , where angle $\psi$ is the heading of the robot with respect to the path-aligned axis of the frame, and offset $d$ is the distance between the robot’s location and the origin of the frame (which lies on the path). The origin $x=0$ corresponds to the robot being on the path with its heading aligned with the path tangent.

The robot uses three control inputs: $u_{1}=\begin{bmatrix}v^{*}&0\end{bmatrix}^{T}$ , $u_{2}=\begin{bmatrix}0&\omega^{*}\end{bmatrix}^{T}$ , and $u_{3}=\begin{bmatrix}0&-\omega^{*}\end{bmatrix}^{T}$ , where $v^{*}>0$ and $\omega^{*}>0$ are constants, so that $U=L=\{u_{1},u_{2},u_{3}\}$ . These vectors correspond to moving forward, turning left, and turning right respectively. The dynamics under each constant input $u_{i}\in U$ in the local Frenet-Serret frame are given by

[TABLE]

where $\rho$ is the (unknown) local curvature of the path.

We approximate the nonlinear dynamics $f(x,u_{1})$ by the affine set-valued map $\mathcal{A}_{1}$ given by

[TABLE]

where $P\subset\mathbb{R}$ is a closed compact set that captures the variation in curvature considered for analysis. The dynamics $f(x,u_{2})$ and $f(x,u_{3})$ are affine and single-valued, so that

[TABLE]

VI-B Training Data and Classification

The training data $D$ consists of triples $(x^{k},y^{k},b^{k})$ , where $x^{k}=(\psi^{k},d^{k})$ , $d^{k}\in\{0.5\textrm{~{}m},0\textrm{~{}m},-0.5\textrm{~{}m}\}$ and $\psi^{k}\in\{\pi/6\textrm{~{}rad},0\textrm{~{}rad},-\pi/6\textrm{~{}rad}\}$ . The measurement $y$ is a vector of dimension $420$ . The data points $x^{k}$ for which $d^{k}=0$ and $\psi^{k}$ is $\pi/6\textrm{~{}rad}$ , $0\textrm{~{}rad}$ , or $-\pi/6\textrm{~{}rad}$ are labeled as $u_{3}$ , $u_{1}$ , and $u_{2}$ respectively. We collect this data in a path that has zero curvature. The entire data set is used to estimate $\hat{\mathcal{H}}$ , using polynomial regression. We take $v^{*}$ and $\omega^{*}$ to be $0.5\textrm{~{}m/s}$ and $0.15\textrm{~{}rad/s}$ respectively.

In the rest of this section, we obtain a classifier $C$ by training three one-vs-one classifiers $w^{12}$ , $w^{13}$ and $w^{23}$ that distinguish between $u_{1}$ and $u_{2}$ , $u_{1}$ and $u_{3}$ , and $u_{2}$ and $u_{3}$ respectively. The loss function is

[TABLE]

where $\gamma>0$ is a parameter we set as $100$ . The loss (31) implements a support vector machine. The class labels are then given by

[TABLE]

VI-C Control-Oriented Training

Let $C_{0}$ be the classifier obtained when training on data without control-oriented constraints. We sketch the closed-loop system due to $C_{0}$ in Figure 6. The points $x^{1}_{e}$ and $x_{e}^{2}$ are switched equilibria [6] when the curvature of the path is strictly positive and negative respectively. When the curvature is zero, every point on the line $\psi=0$ between $x^{1}_{e}$ and $x_{e}^{2}$ is an equilibrium point. All trajectories either begin on this equilibrium set, or approach either $x^{1}_{e}$ or $x_{e}^{2}$ . This analysis was presented in [5] to explain the work in [1].

To demonstrate the need for control-oriented training, we mislabel the training data, and train two sets of classifiers $C_{1}$ and $C_{2}$ using this mislabeled data. Specifically, we use the data corresponding to $(\psi^{k},d^{k})=(\pi/6\textrm{ rad},0.5\textrm{ m})$ as $u_{1}$ , instead of the data corresponding to $(\psi^{k},d^{k})=(0\textrm{ rad},0\textrm{ m})$ . We train $C_{1}$ by using gradient-descent to minimize (31) on the mislabeled data.

We train $C_{2}$ so that a given point $x_{e}^{1}$ , at which $\psi=0$ and $d>0$ , is a locally asymptotically stable equilibrium when the path curvature is positive. Similarly, we want a point $x_{e}^{2}$ , at which $\psi=0$ and $d<0$ , to be a locally asymptotic equilibrium point when the path curvature is negative. We achieve this training by solving the constrained optimization formulation (4) and (5) to minimize (31) on the same data as $C_{1}$ , but subject to the following constraints. We constrain $w^{13}$ so that $(w_{1}^{13})^{T}\mathcal{H}(x_{e}^{1})+w_{0}^{13}=0$ . We use a Lyapunov function $V_{\mathcal{Q}_{1}}(x-x_{e}^{1})$ to ensure that the switching between $\mathcal{A}_{1}$ and $\mathcal{A}_{3}$ renders $x_{e}^{1}$ to be (locally) asymptotically stable. The partition $\mathcal{Q}_{1}$ comprises of $16$ cells depicted in Figure 7. The dynamics $\mathcal{A}_{1}$ , $\mathcal{A}_{2}$ , and $\mathcal{A}_{3}$ are as in Section VI-A, where $\rho=1\textrm{m}$ . We solve the projection step (5) using Algorithm 1, with $\beta=0.001$ . Similarly, we train $w^{12}$ so that $x_{e}^{2}$ is a switched equilibrium and $\rho=-1\textrm{m}$ , with a different Lyapunov function $V_{\mathcal{Q}_{2}}(x-x_{e}^{2})$ as proof of local asymptotic stability. We train $w^{23}$ without any constraints, using loss function (31).

VI-D Results

We simulate the path-following control of a quadrotor when using $C_{1}$ and $C_{2}$ (separately). Figure 8 shows the resulting trajectories, in local coordinates. We see that for the classifier $C_{2}$ trained with control constraints, the trajectories reach the set of equilibria points between $x_{e}^{1}$ and $x_{e}^{2}$ . The switching surfaces are similar to those in Figure 6a. For the classifier $C_{1}$ trained without constraints on $w$ , some trajectories move away from the origin, in fact the quadrotor crashes in the simulation. In the remaining trajectories the quadrotor is reaches a switching surface between turning left and turning right, and consequently oscillates between the two without moving along the path.

VII Conclusions And Future Work

We have presented a novel training algorithm for classifiers that incorporate control-oriented constraints on the classifier parameters. We derived these constraints by modeling the closed-loop system as a piece-wise affine differential inclusion, and using polyhedral Lyapunov functions to verify desired closed-loop properties. We show the usefulness of this novel training method in a simulation of a quadrotor navigating terrain by classifying high-dimensional sensor measurements into one of three possible velocities.

While we have demonstrated the value of the proposed training method through our case study, the method presents some issues to be addressed. Our method requires us to derive a piecewise affine differential inclusion that over-approximates the effect of the classifier-in-the-loop architecture. We do not provide a systematic method to derive the tightest possible over-approximation with respect to the control properties of interest. It is possible that the over-approximation we construct does not satisfy the control properties, even though a tighter one exists that would satisfy them. Furthermore, it is unclear how the choice of the partitions for the differential inclusion and the polyhedral Lyapunov function affects the convergence of Algorithm 1. We investigate these issues in future work.

In our framework, we have assumed that the set of controls $U$ is known, via prior knowledge or human expertise, and the measurements are labeled. In future work, we will investigate the case where the control set $U$ needs to be determined, and/or the measurements are unlabeled.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Giusti, J. Guzzi, D. C. CireÅŸan, F. L. He, J. P. RodrÃguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella, “A machine learning approach to visual perception of forest trails for mobile robots,” IEEE Robotics and Automation Letters , vol. 1, no. 2, pp. 661–667, July 2016.
2[2] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” J. Mach. Learn. Res. , vol. 17, no. 1, pp. 1334–1373, Jan. 2016. [Online]. Available: http://dl.acm.org/citation.cfm?id=2946645.2946684
3[3] E. Alpaydin, Introduction to Machine Learning , 2nd ed. The MIT Press, 2010.
4[4] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning , vol. 20, no. 3, pp. 273–297, Sep 1995.
5[5] H. A. Poonawala and U. Topcu, “Robustness of Classifier-in-the-Loop Control Systems: A Hybrid-Systems Approach,” in IEEE Conference on Decision and Control , 2017.
6[6] A. F. Filippov and F. M. Arscott, Differential equations with discontinuous righthand sides , ser. Mathematics and its Applications, 1988.
7[7] R. Goebel and R. Sanfelice, Hybrid Dynamical Systems: Modeling, Stability, and Robustness . Princeton University Press, 2012.
8[8] S. Prajna and A. Papachristodoulou, “Analysis of switched and hybrid systems - beyond piecewise quadratic methods,” in Proceedings of the 2003 American Control Conference, 2003. , vol. 4, June 2003, pp. 2779–2784 vol.4.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Training Classifiers For Feedback Control

Abstract

I Introduction

Contributions

II Control-Oriented Training Of Classifiers

III Classifier-in-the-Loop Systems

III-A How Do We Model Classifiers For Control?

III-B What Data Are Required?

III-C How Do We Model The Control?

III-D How Do We Derive the Closed-Loop Dynamics?

III-E Is This Model Robust To Uncertainties?

IV Control-Oriented Constraints On Classifier Parameters

Lemma 1**.**

Proof.

V Control-Oriented Training Using Projected Gradient Descent

Proposition 2**.**

Proof.

VI Case Study: Path Following

VI-A Modeling

VI-B Training Data and Classification

VI-C Control-Oriented Training

VI-D Results

VII Conclusions And Future Work

Lemma 1.

Proposition 2.