Training Classifiers For Feedback Control
Hasan A. Poonawala, Niklas Lauffer, Ufuk Topcu

TL;DR
This paper proposes a method for training classifiers for feedback control systems that use high-dimensional sensor data, focusing on stability and performance, demonstrated through a navigation case study.
Contribution
It introduces a control-theoretic training approach for classifiers in feedback control, ensuring stability and performance without explicit state estimation.
Findings
Effective classifier training using projected gradient descent.
Improved stability in feedback control with learned classifiers.
Successful application to sensor-based navigation.
Abstract
One approach for feedback control using high dimensional and rich sensor measurements is to classify the measurement into one out of a finite set of situations, each situation corresponding to a (known) control action. This approach computes a control action without estimating the state. Such classifiers are typically learned from a finite amount of data using supervised machine learning algorithms. We model the closed-loop system resulting from control with feedback from classifier outputs as a piece-wise affine differential inclusion. We show how to train a linear classifier based on performance measures related to learning from data and the local stability properties of the resulting closed-loop system. The training method is based on the projected gradient descent algorithm. We demonstrate the advantage of training classifiers using control-theoretic properties on a case study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Training Classifiers For Feedback Control
Hasan A. Poonawala, Niklas Lauffer, and Ufuk Topcu This material is based upon work supported by the National Science Foundation under Grant No. 1646522 and Grant No. 1652113. Hasan A. Poonawala is with the Department of Mechanical Engineering, University of Kentucky, Lexington, KY 40506, USA. [email protected] Lauffer is with the University of Texas, Austin, TX 78712, USA. [email protected] Topcu is with the Department of Aerospace Engineering, University of Texas, Austin, TX 78712, USA. [email protected]
Abstract
One approach for feedback control using high dimensional and rich sensor measurements is to classify the measurement into one out of a finite set of situations, each situation corresponding to a (known) control action. This approach computes a control action without estimating the state. Such classifiers are typically learned from a finite amount of data using supervised machine learning algorithms. We model the closed-loop system resulting from control with feedback from classifier outputs as a piece-wise affine differential inclusion. We show how to train a linear classifier based on performance measures related to learning from data and the local stability properties of the resulting closed-loop system. The training method is based on the projected gradient descent algorithm. We demonstrate the advantage of training classifiers using control-theoretic properties on a case study involving navigation using range-based sensors.
I Introduction
A common situation in robotics involves using information-rich sensors, which provide high dimensional measurements, to control the state of a robot in different environments. Example of such sensors include cameras and LIDAR. Even though the available measurement is high-dimensional, the robot may often only need to identify the current situation it is in and apply a corresponding control, without explicit knowledge of the state. Obstacle avoidance using proximity sensors such as SONAR are an example of this strategy. A finite set of controls is often sufficient to achieve safe and stable operation of the robot in that environment, where each control in the set corresponds to one of the specific situations that is known to occur.
A classifier, trained using supervised learning methods, often performs the identification step. Once the measurement has been classified into one of the finite possible situations, the system uses a pre-designed control action associated with the classifier output. In many robotic systems such as for mobile robots, human expertise is sufficient to design these control actions. We refer to such a feedback control system as a classifier-in-the-loop system. Figure 1 depicts such a feedback mechanism. Several feedback systems in the literature are classifier-in-the-loop systems [1, 2], however the evaluation of their properties are almost always empirical. We seek to provide a more rigorous approach to the analysis and synthesis of classifiers used for control purposes.
Given training data, one can use supervised learning methods [3, 4] to design a classifier that assigns one of the finite possible controls to a measurement. A common approach to supervised learning involves solution of an optimization problem. The objective function typically consists of a loss function that penalizes errors between the classifier’s prediction for a measurement and the actual target value associated with that measurement in the dataset.
A low value of the loss function does not necessarily say anything about the properties of the resulting closed-loop system. We require a method to relate the closed-loop system properties with the parameters of the classifier. We wish to reformulate existing techniques for training classifiers in a way that is meaningful for their use as feedback controllers.
An important observation that permits development of the training methods we will present involves the recognition that a classifier-in-the-loop control scheme can be modeled [5] using switched [6] and/or hybrid system formalisms [7]. The classifier parameters dictate the switching (or guard) surfaces of the closed-loop system. Training the classifier is equivalent to determining the appropriate switching surface. Analysis of switched systems with variable switching surfaces is central to training of classifier-in-the-loop systems. Some methods exist to analyze or design such hybrid systems [8, 9, 10, 11, 12, 13]. We will use methods from [11, 12, 13].
Contributions
This work involves three contributions. First, we show how to model the control of dynamical systems via classification using piece-wise affine differential inclusions [12]. Second, we formulate the training problem for classifiers used in control as a constrained optimization problem, and derive the corresponding constraints using Lyapunov-based stability conditions appropriate for piece-wise affine differential inclusions [12, 11]. These constraints are bilinear in the optimization variables. Our third contribution is to develop an algorithm for solving the constrained optimization problem based on the projected gradient descent algorithm.
The work in this paper differs from [5] in that here we propose computational algorithms to design classifier-in-the-loop systems. We apply our training method for classification-based feedback control to a robot navigation problem simulated in ROS Gazebo.
II Control-Oriented Training Of Classifiers
Consider a classifier parametrized by a set of weights for some . Training a classifier typically involves the solution of an optimization problem in the form
[TABLE]
where is a loss function for evaluated on the data set .
Assuming that we can compute a gradient of , an iterative algorithm to compute a local optimal point is given by
[TABLE]
where is the gradient, is the learning rate, and is the estimate of at the iteration.
We modify (1) to train classifiers that will be used for control in two ways. We add a term to the objective function. We also add constraints on such that all feasible solutions of the optimization problem correspond to closed-loop systems with desired behavior. Let be the feasible set under these constraints. The resulting optimization problem that trains classifiers for control is
[TABLE]
Given the constraints defining , we can compute an optimal solution using projected gradient descent. This procedure involves solution of the iterative equations
[TABLE]
Figure 2 depicts the procedure.
The key problems considered in this paper are three-fold. First, how do we robustly model the closed-loop system derived from a control scheme involving classification? Second, how do we derive the constraints on the classifier parameters that, if satisfied, imply desirable closed-loop properties of the modeled system? Third, given these constraints (equivalently, the constraint set ), how do solve the optimization problem in (5)?
The next three sections provide a solution to each of these problems. Briefly, we propose the use of piece-wise affine differential inclusions to model the closed-loop system, the use of piece-wise linear Lyapunov functions to certify closed-loop properties, and algorithms for solving biconvex optimization problems to solve (5). These solutions, together, allow us to train classifiers that yield control performance guarantees on the closed-loop system behavior. Note that some of the choices we make to solve each problem are important for being able to combine the three solutions.
III Classifier-in-the-Loop Systems
In this section we show how to model the dynamics of a classifier-in-the loop system as a piece-wise affine differential inclusion [12] of the form
[TABLE]
where and is a set-valued map constructed using affine functions.
III-A How Do We Model Classifiers For Control?
In general, a classifier is a map that assigns a unique label to an measurement , where is the space of measurements. The set is typically finite. Several methods are available to construct a classifier [3]. We focus our attention on classifiers that partition the measurement space into polytopes of identically classified points. This class of classifiers is fairly large, and includes linear classifiers, rectifier networks (common in deep learning), decision trees, and nearest-neighbor classifiers.
Let denote the parameters of a classifier . We refer to the procedure for obtaining using data as training the classifier. We represent the classifier as
[TABLE]
where , is the index set of convex polytopes in the partition induced by the classifier. Note that non-convex polytopes are easily divided into convex polytopes.
When , the classifier (7) becomes
[TABLE]
where are learned from data. In this case, the classifier parameters in (8) directly yield the partition (equivalently, parameters and ) of . For nearest-neighbor classifiers or decision trees, the parameters and will need to be derived from the trained classifier parameters .
When , one often constructs a classifier by combining multiple binary classifiers (8) in different ways. One way is to construct a decision tree, where every node is a binary classifier. Another way is to train multiple classifiers parameterized as , where each classifier distinguishes between one of the possible pairs of labels from . The partitions induced by this approach can be derived from . This multi-label classification scheme is known as one-vs-one classification. Instead, we can train classifiers that separate each label from all other labels. This classification scheme is known as one-vs-all classification.
III-B What Data Are Required?
The set of training data generally consists of triples where is the state of the robot, is the measurement obtained in that state, is the class label associated with the measurement, and denotes the index of the triple in the dataset.
We assume that some labels correspond to known control actions that should be taken when the corresponding measurement is observed, according to the data. The classifier predicts labels for each measurement, effectively predicting a control action for each measurement.
To analyze the classifier, we also learn an approximate map . The approximation may be used for all states , or may be a local approximation using data corresponding to some neighborhood.
III-C How Do We Model The Control?
When we associate a specific control action to each label , the classifier effectively assigns a control to a measurement, where (see Figure 1). That is,
[TABLE]
Recall that is finite and switches in time between its elements. Note that we obtain the control input from the measurement, not the state.
We approximate each vector field using an affine differential inclusion , given by
[TABLE]
where is the convex hull operation, and is the (finite) index set of the affine vector fields that define . For to be a valid over-approximation of over some set , we require that .
III-D How Do We Derive the Closed-Loop Dynamics?
To derive a closed-loop model of the form (6), we must be able to model which labels may be assigned in a state . We assume that there is a continuous functional relationship between the measurement and the state, at least locally in space and time, so that . This assumption is reasonable for robots operating in slowly changing environments. We therefore model (9) as
[TABLE]
We will use an approximation of the map , learned from data, to define the dynamics in a state .
A key idea of our work is that classifier partitions the measurement space , so that control (11) is effectively a state-based switching control. The dynamics switches between the vector fields , where , which we approximate by the inclusions . From equation (7), the partitions of induced by the partitions of are given by inequalities of the form . These inequalities may define cells with nonlinear boundaries.
To model the closed-loop system in a way that is robust to the uncertainty in the switching surfaces, we define convex polytopic partitions where the dynamics may be a combination of the differential inclusions that approximate the vector fields . Figure 3 depicts an example of this process. Figure 3a shows the case when the map is nonlinear, so that the switching surface is not planar.
There are two approaches to obtaining convex partitions. The first one involves linearization of followed by over-approximation, as in Figures 3c and 3d. The linear estimate in a neighborhood of a point is given by
[TABLE]
We locally approximate the partitions by the inequalities
[TABLE]
The set in Figures 3b and 3d is given by . The advantage of this approach is that when using binary classifiers of the form (8), the partition parameters and are linear in .
A second approach is to over-approximate the nonlinear boundary in Figure 3a using polyhedral sets, as in Figure 3b. While this approach is intuitively more appealing than one involving linearization, it is harder to express the partition parameters and as linear functions of .
By combining the classifier representation (7), the differential inclusion dynamics (10), and the approximations (13), we model the classifier-in-the-loop system as
[TABLE]
which is a differential inclusion of the form .
III-E Is This Model Robust To Uncertainties?
The use of over-approximations inherently provides robustness to uncertainties and modeling errors. The caveat to this approach is that the over-approximation may exhibit far too many trajectories. Some of these trajectories may not satisfy the closed-loop properties we wish to certify for the system, while a tighter over-approximation may satisfy the control properties. Procedures to obtain such tight over-approximations are beyond the scope of this paper.
IV Control-Oriented Constraints On Classifier Parameters
Sufficient conditions for certifying the closed-loop properties of (14) become conditions on the classifier parameters , that is, they define in (5). The properties of interest to us include practical asymptotic stability (ultimate boundedness), forward set invariance, and asymptotic stability. These properties physically correspond to low set-point or tracking errors, or to safety via boundedness of the state. We want these properties to hold for all closed-loop trajectories.
Our approach follows much of the existing work on analysis of piece-wise affine differential inclusions [12, 13, 11]. These papers derive sufficient conditions under different assumptions and parameterizations of the certificates (typically Lyapunov functions) for control properties. We present the conditions in terms of our chosen parameterization below. We choose polyhedral Lyapunov functions as motivated by [13], the stability conditions come from results in [12], and methods to remove quantifiers from these conditions are inspired by [11].
A partition in is a collection of subsets ; where is an index set, , for each , and for each pair such that . We define the domain of the partition as . We also refer to the subsets in as the cells of the partition. Note that this definition allows some cells in to represent the boundary between other cells in , which is useful for handling sliding modes.
A piece-wise affine dynamical system associated with partition is a collection,
[TABLE]
that to each cell assigns the affine differential inclusion .
[TABLE]
The cell is given by
[TABLE]
We parameterize a continuous polyhedral Lyapunov function with a partition and a collection of vectors such that . Each set is given by , and we assume that is pointed at the origin.
We define index sets that denote the relationship between the system and the Lyapunov function . Let be the set of pairs of indices such that . Let be the set of all triples such that , , , and .
Sufficient conditions on a piece-wise differential inclusion and candidate Lyapunov function that certify the existence of the control properties under consideration are given in [12]. The result below formally states these conditions in terms of our parametrization and notation.
Lemma 1**.**
Let be a piece-wise affine dynamical system and be a candidate Lyapunov function. Let , , , and be the index sets associated with and . Let be connected, and let contain the origin. Let and be the largest and smallest level set of in .
If the set of constraints
[TABLE]
is feasible, then
* is invariant* 2. 2.
* is ultimately bounded*
Furthermore, if , then the origin of is asymptotically stable with region of attraction .
Proof.
See the appendix. ∎
V Control-Oriented Training Using Projected Gradient Descent
In this section, we present an algorithm to solve (5), given a representation of (a subset of) the set in terms of (18)-(23). The constraints (18)-(23) may be infeasible, since the candidate Lyapunov function (which always exists for suitable partition ) may not decrease along the dynamics of . To remedy this issue of infeasibility, we relax constraint (20). To account for this relaxation, we modify the objective function.
The following optimization problem implements (5):
[TABLE]
[TABLE]
where serve as slack variables that relax the equality constraint (20), and is a weighting factor. The optimization variables include , , , and . The optimization problem (24)-(30) has the following property.
Proposition 2**.**
Let be a piece-wise affine differential inclusion and be a candidate polyhedral Lyapunov function. The optimization problem (24)-(30) is feasible.
Proof.
The partition allows selection of vectors such that is continuous and for all by construction. This property of implies that constraints (25), (26), (29), and (30) are feasible. Since is unconstrainted, (27) and (28) are feasible, independent of the values of the remaining optimization variables. ∎
The constraints (25)-(30) are bilinear in the variables of the optimization problem. Optimization problems with bilinear constraints are typically NP-hard. We use a variant of Alternate Convex Search (ACS) [14] to solve this optimization problem, given in Algorithm 1.
We begin with the classifier parameters obtained after a gradient descent step (4). We alternate between solving two convex optimization problems obtained by fixing a different subset of the variables in (24)-(30). We use a value of to obtain solutions where the slack variables are zero. The convexity and feasibility (Proposition 2) of these problems imply that solutions exist at every iteration.
The first step of an iteration fixes , and obtain a solution to (24)-(30) denoted by . We then solve (24)-(30) again, however is now a variable, and variables remain variable. The variables , , and are fixed to the corresponding values , , and . The optimal solution is , and is used as the fixed value of in the next iteration of this procedure.
Note that the set may change as the partition changes. One way to avoid needing to recompute at every iteration of the Alternate Convex Search is to set . This approach is implicitly taken in [11, 13].
VI Case Study: Path Following
In our case study, we task a quadrotor equipped with an infra-red-based-scanning device to navigate a canyon-like terrain. We use the Gazebo robot simulation environment (see Figure 4), running on the Robot Operating System, to simulate this scenario. We demonstrate that the use of control constraints while training classifiers safeguards against unstable behavior.
VI-A Modeling
We model the quadrotor kinematics as a differential-drive like mobile robot. That is, we command the quadrotor to achieve has a forward velocity and an angular velocity . The simulated quadrotor, however, possesses full inertial and rotational dynamics and implements lower-level controllers to track the commanded velocities, allowing us to abstract away those full dynamics.
The corridor/canyon defines a path in the plane. We can attach a moving coordinate frame, known as a Frenet-Serret frame, to this path, and express the dynamics of the robot within this frame (see Figure 5). The configuration of the agent in the Frenet-Serret frame is , where angle is the heading of the robot with respect to the path-aligned axis of the frame, and offset is the distance between the robot’s location and the origin of the frame (which lies on the path). The origin corresponds to the robot being on the path with its heading aligned with the path tangent.
The robot uses three control inputs: , , and , where and are constants, so that . These vectors correspond to moving forward, turning left, and turning right respectively. The dynamics under each constant input in the local Frenet-Serret frame are given by
[TABLE]
where is the (unknown) local curvature of the path.
We approximate the nonlinear dynamics by the affine set-valued map given by
[TABLE]
where is a closed compact set that captures the variation in curvature considered for analysis. The dynamics and are affine and single-valued, so that
[TABLE]
VI-B Training Data and Classification
The training data consists of triples , where , and . The measurement is a vector of dimension . The data points for which and is , , or are labeled as , , and respectively. We collect this data in a path that has zero curvature. The entire data set is used to estimate , using polynomial regression. We take and to be and respectively.
In the rest of this section, we obtain a classifier by training three one-vs-one classifiers , and that distinguish between and , and , and and respectively. The loss function is
[TABLE]
where is a parameter we set as . The loss (31) implements a support vector machine. The class labels are then given by
[TABLE]
VI-C Control-Oriented Training
Let be the classifier obtained when training on data without control-oriented constraints. We sketch the closed-loop system due to in Figure 6. The points and are switched equilibria [6] when the curvature of the path is strictly positive and negative respectively. When the curvature is zero, every point on the line between and is an equilibrium point. All trajectories either begin on this equilibrium set, or approach either or . This analysis was presented in [5] to explain the work in [1].
To demonstrate the need for control-oriented training, we mislabel the training data, and train two sets of classifiers and using this mislabeled data. Specifically, we use the data corresponding to as , instead of the data corresponding to . We train by using gradient-descent to minimize (31) on the mislabeled data.
We train so that a given point , at which and , is a locally asymptotically stable equilibrium when the path curvature is positive. Similarly, we want a point , at which and , to be a locally asymptotic equilibrium point when the path curvature is negative. We achieve this training by solving the constrained optimization formulation (4) and (5) to minimize (31) on the same data as , but subject to the following constraints. We constrain so that . We use a Lyapunov function to ensure that the switching between and renders to be (locally) asymptotically stable. The partition comprises of cells depicted in Figure 7. The dynamics , , and are as in Section VI-A, where . We solve the projection step (5) using Algorithm 1, with . Similarly, we train so that is a switched equilibrium and , with a different Lyapunov function as proof of local asymptotic stability. We train without any constraints, using loss function (31).
VI-D Results
We simulate the path-following control of a quadrotor when using and (separately). Figure 8 shows the resulting trajectories, in local coordinates. We see that for the classifier trained with control constraints, the trajectories reach the set of equilibria points between and . The switching surfaces are similar to those in Figure 6a. For the classifier trained without constraints on , some trajectories move away from the origin, in fact the quadrotor crashes in the simulation. In the remaining trajectories the quadrotor is reaches a switching surface between turning left and turning right, and consequently oscillates between the two without moving along the path.
VII Conclusions And Future Work
We have presented a novel training algorithm for classifiers that incorporate control-oriented constraints on the classifier parameters. We derived these constraints by modeling the closed-loop system as a piece-wise affine differential inclusion, and using polyhedral Lyapunov functions to verify desired closed-loop properties. We show the usefulness of this novel training method in a simulation of a quadrotor navigating terrain by classifying high-dimensional sensor measurements into one of three possible velocities.
While we have demonstrated the value of the proposed training method through our case study, the method presents some issues to be addressed. Our method requires us to derive a piecewise affine differential inclusion that over-approximates the effect of the classifier-in-the-loop architecture. We do not provide a systematic method to derive the tightest possible over-approximation with respect to the control properties of interest. It is possible that the over-approximation we construct does not satisfy the control properties, even though a tighter one exists that would satisfy them. Furthermore, it is unclear how the choice of the partitions for the differential inclusion and the polyhedral Lyapunov function affects the convergence of Algorithm 1. We investigate these issues in future work.
In our framework, we have assumed that the set of controls is known, via prior knowledge or human expertise, and the measurements are labeled. In future work, we will investigate the case where the control set needs to be determined, and/or the measurements are unlabeled.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Giusti, J. Guzzi, D. C. CireÅŸan, F. L. He, J. P. RodrÃguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella, “A machine learning approach to visual perception of forest trails for mobile robots,” IEEE Robotics and Automation Letters , vol. 1, no. 2, pp. 661–667, July 2016.
- 2[2] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” J. Mach. Learn. Res. , vol. 17, no. 1, pp. 1334–1373, Jan. 2016. [Online]. Available: http://dl.acm.org/citation.cfm?id=2946645.2946684
- 3[3] E. Alpaydin, Introduction to Machine Learning , 2nd ed. The MIT Press, 2010.
- 4[4] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning , vol. 20, no. 3, pp. 273–297, Sep 1995.
- 5[5] H. A. Poonawala and U. Topcu, “Robustness of Classifier-in-the-Loop Control Systems: A Hybrid-Systems Approach,” in IEEE Conference on Decision and Control , 2017.
- 6[6] A. F. Filippov and F. M. Arscott, Differential equations with discontinuous righthand sides , ser. Mathematics and its Applications, 1988.
- 7[7] R. Goebel and R. Sanfelice, Hybrid Dynamical Systems: Modeling, Stability, and Robustness . Princeton University Press, 2012.
- 8[8] S. Prajna and A. Papachristodoulou, “Analysis of switched and hybrid systems - beyond piecewise quadratic methods,” in Proceedings of the 2003 American Control Conference, 2003. , vol. 4, June 2003, pp. 2779–2784 vol.4.
