Enhancement of Energy-Based Swing-Up Controller via Entropy Search
Chang Sik Lee, Dong Eui Chang

TL;DR
This paper enhances an energy-based swing-up controller for a rotary inverted pendulum by applying Bayesian optimization with Entropy Search, resulting in improved performance across different initial conditions.
Contribution
It introduces a novel application of Entropy Search Bayesian optimization to tune parameters of an energy-based swing-up controller for the Furuta pendulum.
Findings
Optimal controller outperforms nominal controller in simulations.
Performance improvements observed across various initial conditions.
Bayesian optimization effectively finds suitable controller parameters.
Abstract
An energy based approach for stabilizing a mechanical system has offered a simple yet powerful control scheme. However, since it does not impose such strong constraints on parameter space of the controller, finding appropriate parameter values for an optimal controller is known to be hard. This paper intends to generate an optimal energy-based controller for swinging up a rotary inverted pendulum, also known as the Furuta pendulum, by applying the Bayesian optimization called Entropy Search. Simulations and experiments show that the optimal controller has an improved performance compared to a nominal controller for various initial conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl and Stability of Dynamical Systems · Advanced Control Systems Optimization · Adaptive Control of Nonlinear Systems
Enhancement of Energy-Based Swing-Up Controller via Entropy Search
Chang Sik Lee1 and Dong Eui Chang2,3 1School of Electrical Engineering, KAIST, Daejeon, Korea. [email protected]2Corresponding author, School of Electrical Engineering, KAIST, Daejeon, Korea. [email protected]3This research has been in part supported by KAIST under grant N11180231 and N11190038, and by the ICT RD program of MSIP/IITP [2016-0-00563, Research on Adaptive Machine Learning Technology Development for Intelligent Autonomous Digital Companion].
Abstract
An energy based approach for stabilizing a mechanical system has offered a simple yet powerful control scheme. However, since it does not impose such strong constraints on parameter space of the controller, finding appropriate parameter values for an optimal controller is known to be hard. This paper intends to generate an optimal energy-based controller for swinging up a rotary inverted pendulum, also known as the Furuta pendulum, by applying the Bayesian optimization called Entropy Search. Simulations and experiments show that the optimal controller has an improved performance compared to a nominal controller for various initial conditions.
I INTRODUCTION
The task of stabilizing an underactuated mechanical system has been investigated over decades. Accordingly, several ideas have been proposed to resolve the problem in improved methods [1, 2, 3, 4, 5, 6]. The idea of using a particular storage function established on the Euler-Lagrange equations of a mechanical system has presented a framework for an effective energy-based swing-up controller [7]. A drawback of the result is that, when it comes to applying it to a real system, the controller requires vague adjustment over a multidimensional parameter space.
Meanwhile, the construction of optimally adjusted controllers has been studied from a wide and diversified point of view [8, 9]. In recent years, as the notion of machine learning has been widening its coverage over a variety of fields, it has also begun to put its influence on the optimal control of mechanical systems[10, 11, 12, 13, 14, 15, 16]. Da et al.[12] deploys supervised learning methods to obtain more robust controllers for a 3D bipedal robot. In [13] and [16], reinforcement learning algorithms are used to compensate for unmodeled dynamics of systems. Furthermore, as a sample-efficient methodology to solve non-convex optimization problems, Bayesian optimization are widely adopted to optimize controllers[11, 14, 15].
However, all the approaches in [10, 11, 12, 13, 14, 15, 16] have a common problem that they look for local minima. On the other hand, Marco et al.[10] tackles the task of finding proper parameter values for a controller that optimally stabilizes a linear model by using Entropy Search[17], a machine learning process which finds a global minimum of a given cost function.
This paper aims to take advantage of the machine learning optimization technique to resolve the drawback of the energy-based control[7] for stabilizing a nonlinear model. To be specific, we use an energy-based controller for a rotary inverted pendulum system, and we intend to fit a Gaussian process estimation model through repeated evaluations of a cost function whose distribution is unknown, following procedures of Entropy Search [17]. Consequently, we can globally estimate the optimal parameter value for the best performance of the controller.
II PROBLEM STATEMENT
Kolesnichenko and Shiriaev [7] has proposed an energy-based swing-up controller for an underactuated mechanical system, and provided sufficient conditions on the controller’s gain parameters for successful swing-up. However, not all the parameter values under the conditions result in assured swing-up of the real system. Moreover, even though most parameter values can build controllers that drive the system to eventually reach the desired swing-up equilibrium point, their performances may not be all satisfactory. Therefore, there still remains the laborious task to find a set of parameter values which achieves the desired performance to swiftly reach the desired equilibrium point with less oscillation.
The task to find such values of control parameters is formulated as an optimization problem with a cost function that properly reflects the desired performance,
[TABLE]
where is a parameter domain. To solve this optimization problem, we employ the Bayesian optimization technique called Entropy Search; refer to [17] for more details on Entropy Search. Entropy Search has the merit that, where not all the values of are not known, it globally estimates the given cost function and finds a reliable global minimum while most of other algorithms seek local minima.
III Preliminaries
Before description of the main result, we offer backgrounds on Entropy Search.
III-A Entropy Search
The problem (1) can be stated as finding that optimizes a function while the functional relationship between and is not known a priori. Namely, the values of cost function may not be available or observable for all . In such a situation, Bayesian optimization methods are quite useful since they repeatedly estimate an arbitrary black box function “” based on a probabilistic model and selects an appropriate measure point for more accurate modeling. Among several available Bayesian techniques, we choose to use Entropy Search which efficiently finds global minimum [17].
Two tools are required for Bayesian optimization. One is a probabilistic model for estimating the black box function based on measurements
[TABLE]
and the other is a decision rule for specifying a new point where will be evaluated so that the estimation model approaches closer to the actual values of .
First, as its estimation model, Entropy Search utilizes a Gaussian process. A Gaussian process is a non-parametric model generally used to estimate an unknown function . Suppose as a prior mean and as a covariance function (kernel) between and , where . The former implies the prior belief on , which is usually a constant, and the latter suggests the relationship between those two random variables and . Given a set of evaluation (2) at a set of points given by
[TABLE]
the function value at a new point is a random variable with a Gaussian distribution with the posterior mean and variance given respectively by
[TABLE]
where
[TABLE]
Utilization of above equations allows us to estimate the functional relationship between and . For more details, refer to [17, 18]
Secondly, in order to determine the next measurement point, Entropy Search computes the expected change in entropy of , where and are defined as
[TABLE]
with being the Gaussian process estimation of . i.e. , and is the uniform distribution over . The next measurement point is then selected by finding a point with the largest expected change in entropy (). This decision rule is established on the assumption that the next measurement point obtained as above is the most informative point.
The measurement of is made at the new point , and then and are added respectively to the sets and after which the two sets are renamed as and . Entropy Search then returns a best guess point at which the cost function is likely to be minimum, that is, where is the largest by definition of . This makes the end of a single process.
The process is repeated until the model has sufficiently converged to the objective function and is peaked around the optimum [18]. Namely, the termination of the process is determined when a posterior mean at a best guess does not change over a threshold for consecutive iterations. For more details including derivation of , refer to [17].
To sum up, given an initial condition, a termination threshold , a duration , and a set of evaluations (2) at arbitrary points (3), Entropy Search can be described as in the following algorithm:
IV Swing Up of the Furuta Pendulum
IV-A Swing-Up Controller
As an underactuated mechanical system, we choose Quanser QUBE Servo 2[19] which is a kind of Furuta pendulum. Assume an ideal model of the Furuta pendulum system with no noise and no frictions. The configuration space of the system is , where is an angle of the rotary arm, is an angle of the inverted pendulum, as shown in Figure 1. The Lagrangian of the system is given by
[TABLE]
where
[TABLE]
with being the potential energy. The Euler-Lagrange equations of the system are computed as
[TABLE]
where
[TABLE]
and
[TABLE]
where and are masses, and are moments of inertia, and are lengths of rotary arm and pendulum respectively. The symbol denotes the gravitational acceleration and is the potential energy at the equilibrium point . The values of the parameters are
[TABLE]
which are from the table on p.8 of [19]. The total energy is given by
[TABLE]
Kolesnichenko and Shiriaev [7] introduces the following storage function :
[TABLE]
where the original term in Kolesnichenko and Shiriaev [7] has been replaced by in order to take the periodicity of angle into account. From the storage function, one can easily derive the following energy-based controller
[TABLE]
where
[TABLE]
See [7] for a detailed derivation.
The swing-up control law (5) contains 4 parameters: , , and , which are put in vector form as follows:
[TABLE]
According to Theorem 2 of [7], a sufficient condition on for successful swing-up is given by
[TABLE]
In the range of |q_{2}|\leq$$$, the swing-up controller ([5](#S4.E5)) is switched to the LQR for the linearization of the system at the equilibrium point with the weight matrices Q=\operatorname{diag}([1,10,1,10])R=10000$.
To sum up, we swing up the rotary inverted pendulum relying on the energy-based controller (5). When the pendulum is in the region where the linearized model is effective, the LQR is turned on to hold the pendulum at the desired equilibrium point.
IV-B Optimization of Swing-Up Controller via Entropy Search
This section explains practical details about the optimization task to obtain an optimal swing-up controller. We first provide a common setup for simulations and experiments such as the range of parameters, a cost function, and the initial condition. The range of parameter vector is set as
[TABLE]
which defines the bounded domain . The above range for is determined on the basis of the following observations: In the controller formula (5), the energy term is relatively small due to the small values of the system’s physical parameters, so the gain to the energy term is chosen from the range, , of large numbers relative to other gains. Moreover, the controller has a tendency to work well when is close to its lower bound given in (6), from which the range, , is derived. Ranges of the other parameters and are chosen in a way that the controller works well, provided that and are readily set in the above ranges, in several simulations.
We set a cost function as follows:
[TABLE]
where is the initial time, is the terminal time, and we use the following state vector
[TABLE]
By introducing initial conditions in denominators, the cost value defined in (IV-B) is less influenced by modification of initial conditions, which makes cost values comparable over various initial conditions. For these reasons, (IV-B) is used to measure performance of the controller in this paper.
The default initial condition for simulations and experiments is set as
[TABLE]
With the setting given above, we find a nominal controller by running 10,000 simulations in Matlab Simulink, where the time span of each simulation is 30 seconds. Each simulation starts with choosing a gain parameter vector uniformly randomly from the range (7), and ends with computing a cost value . After all the simulations are finished, the set of parameter vectors which result in the lowest costs in the simulations are tested in experiments to obtain their experimental costs. Through this procedure, a set of parameter values which yields the lowest experimental cost has been found as follows:
[TABLE]
which is used as the nominal parameter vector.
We now find an optimal controller using Entropy Search. For Gaussian process, we choose constant prior mean and the rational quadratic kernel function
[TABLE]
with , and
[TABLE]
The hyperparameters, , , and , for the Gaussian process have been determined based on the result of running several times of simulations and hyperparameter fittings[18].
Before initializing Algorithm 1 to perform Entropy Search, we run 5 simulations with the default initial condition (10) to form a set of initial observations (2) at a set of points (3). Once the sets and are made, Entropy Search starts by running Algorithm 1. We use simulations, in line 11 of Algorithm 1, to compute trajectories of the system driven by controller where a single simulation is run for 30 seconds with the default initial condition. The process is terminated when the posterior mean at the best guess has not changed more than for iterations or when an iteration is repeated for times. Verification of the resultant controller is executed in a simulation and an experiment for 30 seconds after Algorithm 1 is completed.
After 60 iterations, Entropy Search obtains the optimal parameter vector
[TABLE]
Figure 2 shows how Entropy Search has converged to by iteratively evaluating a cost value and estimating a posterior mean at a best guesses . To be specific, in the upper side of Figure 2, a cost value obtained at a suggested point , following the line 9 – 12 of Algorithm 1, is plotted for each iteration. In the lower side, a posterior mean at a best guess given in line 14 of Algorithm 1 is plotted for each iteration. As the iterative process goes on, the posterior mean at the best guess point approaches to a certain value, which indicates that the estimation model has been fit to the real distribution of over iterations.
IV-C Performance Comparison
We have run two simulations for the default initial condition (10): one with the nominal controller and the other with the optimal controller , and have obtained the following cost values:
[TABLE]
from which it is deduced that the optimal controller yields a cost value than the nominal controller. Although the optimal gain has been obtained for the default initial condition, our exhaustive simulations show that it performs well for various initial conditions in the range of , with zero initial velocity. Figure 3 shows cost values of the optimal controller and the nominal controller sampled from the set of entire costs computed in simulations, where they respectively form plots over initial conditions.
For the purpose of verification, we test the two controllers and on the system of Quanser QUBE Servo 2 for the following initial conditions:
[TABLE]
with the other states at zero.
For each initial condition, the cost value is computed by averaging the cost values of 5 repeated experiments. The results are plotted in Figure 4. It can be seen that the optimal controller produces a lower cost value for each initial condition than the nominal controller. The time responses of the two controllers for the initial conditions are measured in experiments and plotted in Figures 5, 6, and 7, repectively. It can be seen that the response with the optimal controller has a shorter settling time than the nominal controller for each initial condition. It follows that Entropy Search has succeeded in isolating an energy-based controller with the best performance, which leads to quick and firm stabilization of the rotary inverted pendulum. The video of the experiments is available at https://youtu.be/JcmpLU5rJCg.
V CONCLUSIONS
The energy based controller proposed in [7] is not only derived easily by considering the energy of system but also effective in stabilizing an underactuated non-linear system. However, it still requires a considerable amount of efforts, such as searching through multidimensional hyper-parameter space, to isolate optimal parameter values. This paper proposes application of Entropy Search to the problem of finding the optimal gain parameter values of an energy-based swing-up controller for the Furuta pendulum system. Based on the results in Section IV-C, it is concluded that Entropy Search successfully optimizes the given controller so that the optimal controller attains a better performance than the nominal controller. In the future, we will combine Entropy Search with a deep neural network [20] to enhance the performance of the controller.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. M. Bloch, D. E. Chang, N. E. Leonard, and J. E. Marsden, “Controlled Lagrangians and the stabilization of mechanical systems. II. potential shaping,” IEEE Transactions on Automatic Control , vol. 46, no. 10, pp. 1556–1571, Oct 2001.
- 2[2] D. E. Chang, “Stabilizability of controlled Lagrangian systems of two degrees of freedom and one degree of under-actuation by the energy-shaping method,” IEEE Transactions on Automatic Control , vol. 55, no. 8, pp. 1888–1893, Aug 2010.
- 3[3] D. E. Chang, “The method of controlled Lagrangians: Energy plus force shaping,” SIAM Journal on Control and Optimization , vol. 48, no. 8, pp. 4821–4845, 2010.
- 4[4] W. Ng, D. E. Chang, and G. Labahn, “Energy shaping for systems with two degrees of underactuation and more than three degrees of freedom,” SIAM Journal on Control and Optimization , vol. 51, no. 2, pp. 881–905, 2013.
- 5[5] K. Åström and K. Furuta, “Swinging up a pendulum by energy control,” Automatica , vol. 36, no. 2, pp. 287 – 295, 2000.
- 6[6] A. L. Fradkov, “Swinging control of nonlinear oscillations,” International Journal of Control , vol. 64, no. 6, pp. 1189–1202, 1996.
- 7[7] O. Kolesnichenko and A. S. Shiriaev, “Partial stabilization of underactuated Euler–Lagrange systems via a class of feedback transformations,” Systems and Control Letters , vol. 45, no. 2, pp. 121 – 132, 2002.
- 8[8] I. Kamwa, G. Trudel, and L. Gerin-Lajoie, “Robust design and coordination of multiple damping controllers using nonlinear constrained optimization,” in Proceedings of the 21st International Conference on Power Industry Computer Applications. Connecting Utilities. PICA 99. To the Millennium and Beyond (Cat. No.99CH 36351) , May 1999, pp. 87–94.
