Adaptive Critic Based Optimal Kinematic Control for a Robot Manipulator
Aiswarya Menon, Ravi Prakash, Laxmidhar Behera

TL;DR
This paper introduces a novel adaptive critic control method for robot manipulators that guarantees convergence to optimality and stability, validated through simulations and real-world experiments with a 6-DOF robot.
Contribution
It proposes a new critic weight update law within the SNAC framework that ensures convergence and stability in kinematic control tasks.
Findings
Guaranteed convergence to optimal cost
Stable control of robot manipulator
Validated with real robot experiments
Abstract
This paper is concerned with the optimal kinematic control of a robot manipulator where the robot end effector position follows a task space trajectory. The joints are actuated with the desired velocity profile to achieve this task. This problem has been solved using a single network adaptive critic (SNAC) by expressing the forward kinematics as input affine system. Usually in SNAC, the critic weights are updated using back propagation algorithm while little attention is given to convergence to the optimal cost. In this paper, we propose a critic weight update law that ensures convergence to the desired optimal cost while guaranteeing the stability of the closed loop kinematic control. In kinematic control, the robot is required to reach a specific target position. This has been solved as an optimal regulation problem in the context of SNAC based kinematic control. When the robot is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Adaptive Critic Based Optimal Kinematic Control for a Robot Manipulator
Aiswarya Menon1, Ravi Prakash2, Laxmidhar Behera3 All authors are with the Department of Electrical Engineering, Indian Institute of Technology, Kanpur, India-208016. Email id : 1[email protected], 2[email protected],3[email protected]
Abstract
This paper is concerned with the optimal kinematic control of a robot manipulator where the robot end effector position follows a task space trajectory. The joints are actuated with the desired velocity profile to achieve this task. This problem has been solved using a single network adaptive critic (SNAC) by expressing the forward kinematics as input affine system. Usually in SNAC, the critic weights are updated using back propagation algorithm while little attention is given to convergence to the optimal cost. In this paper, we propose a critic weight update law that ensures convergence to the desired optimal cost while guaranteeing the stability of the closed loop kinematic control. In kinematic control, the robot is required to reach a specific target position. This has been solved as an optimal regulation problem in the context of SNAC based kinematic control. When the robot is required to follow a time varying task space trajectory, then the kinematic control has been framed as an optimal tracking problem. For tracking, an augmented system consisting of tracking error and reference trajectory is constructed and the optimal control policy is derived using SNAC framework. The stability and performance of the system under the proposed novel weight tuning law is guaranteed using Lyapunov approach. The proposed kinematic control scheme has been validated in simulations and experimentally executed using a real six degrees of freedom (DOF) Universal Robot (UR) 10 manipulator.
I Introduction
Modern robotic systems are becoming complex as the degrees of freedom are increasing to address the complex application scenario such as agriculture, health-care and ware-house automation. A mobile manipulator with Shadow hand mounted on it has 28 degrees of freedom. The inverse kinematic solution to such systems can not be solved analytically. Thus neural and fuzzy neural network based schemes have become popular to design kinematic control[1, 2, 3, 4]. These approaches can not learn the inverse kinematics while optimizing a global cost function. This is very important when one deals with redundant manipulators. One of the approaches to design optimal kinematic control is to use the Hamilton-Jacobi-Bellman (HJB) formulation [5, 6]. The analytical solution of the HJB equation is still a major challenge and as such these solutions are obtained off-line. The real time approximate solutions are known as approximate dynamic programming and are presented in [7, 8, 9]. In these approaches, the framework uses two neural networks - one for action and the other critic. Action network learns to actuate an optimal policy while critic network evaluates the cost function through learning.
The single network adaptive critic (SNAC) is introduced [10] where critic is updated using back-propagation.
In [11], the kinematic control of a robot manipulator using this SNAC framework was presented. The block diagram of the kinematic control scheme for a robot manipulator using SNAC is shown in Fig. 1. As these schemes used back-propagation for critic weight updates, after repeated training, one could show the convergence to near optimal cost through extensive simulations. There exists very few works in the literature that has shown the convergence to the optimal cost in an analytical manner along with the proof of stability. In this work, we are solving the kinematic control problem of any -DOF robot manipulator where the tasks of reaching a fixed target position and following a time varying task space trajectory have been solved in the framework of optimal regulation and optimal tracking respectively using SNAC. A simple and novel critic weight update rule which ensures that the closed loop system is stable has been proposed. The optimal kinematic control policy using HJB formulation has been developed in the framework of optimal regulation and optimal tracking. Former make the robot to reach fixed target position and later make the robot to follow the task-space time varying trajectory. The analytical proof of stability and convergence has been done using Lyapunov approach. As the degrees of freedom of robotic systems are increasing, learning based strategies will play more important role as compared to model based analytics [12],[13]. Thus the proposed approach has significant relevance in this context. The relevant literature is further scrutinized.
The optimization based kinematic control has been accomplished using local optimization for finding an instantaneous optimal solution [14, 15]. It is centered on the Jacobian pseudo-inverse and null space. The redundancy is resolved by including some constraints into the direct kinematic model or by projecting a particular solution onto the Jacobian null space. The other approach is a global optimization which uses an integral type performance index along the whole trajectory [16, 17]. The redundancy resolution is converted to an optimal control problem with the necessary conditions of optimality given by the Pontryagin’s principle or by the optimal control theory. Both of the aforementioned methods are suboptimal and require the pseudo-inversion of Jacobian continuously over time which is computationally expensive and suffers from local instability problems [18, 19, 20].
Existing approach towards following a time varying task space trajectory optimally is to find the feedforward term using the dynamics inversion concept and the feedback term by solving an HJB equation [21]. However, such solution is only near optimal because of the feedforward term. Therefore by using an augmented system dynamics consisting of feedback tracking error and reference trajectory and using it to solve the HJB equation results in an optimal control law which is a combination of feedforward and feedback control inputs [22].
The remainder of this paper is organised as follows. Problem formulation for optimal kinematic control is presented in Section II. In Section III and IV detailed mathematical derivation for the control scheme along with stability proof is presented. In Section V, various simulation and experimental results for kinematic motion control of a 6-DOF robotic manipulator are evaluated with comparison with the state-of-the-art kinematic control solutions. This paper is finally concluded in Section VI.
II Problem Formulation
The forward kinematics of a manipulator involves a non-linear transformation from Joint space to Cartesian space as described by:
[TABLE]
where, describes the position and orientation of end effector in workspace at time , each element of which describes the joint angle in the joint space at time , and is the non linear mapping. Because of non-linearity and redundancy of mapping, it is usually difficult to directly get for desired , where is the desired end effector position. By contrast, the mapping from joint space to cartesian space at velocity level is affine mapping. Taking time derivative on both sides of (1) gives:
[TABLE]
where, is the Jacobian matrix of .
In this paper we focus on design of angular speed of the manipulator which serves as an input to the tracking control loop for robot control. Then the robot manipulator kinematics can be rewritten by replacing by :
[TABLE]
Thus, we get the dynamics of the system as above. In this paper we propose to solve for an optimal control input for a robot manipulator described by (3), such that the tracking error given by,
[TABLE]
for a given reference trajectory , reduces to zero with time.
III Optimal Regulation
In this section, we address the optimal kinematic control of a robot manipulator, where the robot end effector has to reach a fixed target position. This is framed as an optimal regulation problem. In the context of kinematic control, optimal regulation means that the robot is made to reach a fixed target position following an optimal trajectory. This trajectory is generated by actuating the desired optimal control policy.
Let be the desired end effector position. Differentiating (4) with respect to time, error dynamics of the system is obtained as,
[TABLE]
III-A Formulation of control policy
The infinite horizon HJB cost function for (5) is given by,
[TABLE]
where, and are positive definite design matrices. The control input needs to be admissible so that the cost function equation (6) is finite. The Hamiltonian for the this cost function with the admissible control input is,
[TABLE]
Here, is the gradient of the cost function with respect to . The optimal control input which minimizes the cost function (6) also minimizes the Hamiltonian (7). Therefore, the optimal control is found by solving the following condition and obtained to be
[TABLE]
Then, HJB equation may be re-written as follows,
[TABLE]
with the cost function .
III-B Neural Network control design
By using the universal approximation property, is constructed by a single hidden layer neural network with a non-linear activation function:
[TABLE]
where, , is the constant target NN weight vector, is the activation function output, is the number of hidden neurons and is the function reconstruction error. The target NN vector and reconstruction errors are assumed to be upper bounded according to and .
[TABLE]
Approximate optimal cost function and its gradient is given by
[TABLE]
where, is the NN estimate of target weight vector . Using (8) and (11), the optimal control law can be written as:
[TABLE]
Considering (8) and (13), estimated optimal control law is:
[TABLE]
The modified error dynamics using (15) is,
[TABLE]
Before proposing the critic weight tuning law and stability proofs, following assumptions need to be made:
Assumption 1: For the system represented by its error dynamics (5), with cost function (6), let be a continuously differentiable Lyapunov function candidate satisfying,
[TABLE]
Then, there exists a positive definite matrix, ensuring,
[TABLE]
During the implementation, can be obtained by selecting a polynomial with respect to the vector , such as .
Remark 1: Based on the result of [23], the closed loop dynamics with optimal control law can be bounded by a function of the system state. In such situations, it can be asssumed that with , . Combining (17) with the fact given, , it implies that assumption 1 holds.
Moving on, a simple and novel critic weight tuning law is proposed.
[TABLE]
where, is the learning rate of critic network and .By using this weight tuning law, stability and performance is guaranteed theoretically using the Lyapunov approach.
III-C Stability Analysis
In this section, the stability of the system for optimal regulation is investigated.
Assumption 2: The Jacobian matrix is bounded as ,where is a positive constant. Here, the control coefficient matrix is the Jacobian matrix. Also, are bounded as and where and are positive constants.
The Lyapunov candidate function is selected as,
[TABLE]
[TABLE]
where, . Using (19),
[TABLE]
The system error dynamics of the system for the optimal control law is . Using the control laws (14) and (15),
[TABLE]
Using (23) in (22) and on simplification,
[TABLE]
Applying Assumption 1 and Assumption 2 here
[TABLE]
From (25), the following inequality may be derived:
[TABLE]
With the above condition to be true, and the system is stable in the sense of Lyapunov implying that and are both bounded. It may be expressed as : .
It may be also observed that the estimated optimal controller (15) converges to a neighborhood of the optimal feedback controller (14) with a finite bound as,
[TABLE]
Hence, it may be concluded that the instantaneous cost function is also bounded.
Taking time derivative of (24),
[TABLE]
Remark 2:
All the terms in can be shown to be bounded.
It may be observed that is bounded and Barbalat’s Lemma [24] can be invoked to conclude the asymtotic stability of the system and convergence of the parameter estimation error and the weight estimation errors towards zero. In other words, it ensures that .
IV Optimal Tracking Control
In this section, the kinematic control of a robot manipulator following a time varying task space trajectory is solved in the framework of optimal tracking.
IV-A Formation of Augmented System
Let the time varying reference trajectory, denoted by be possessing the dynamics
[TABLE]
with the initial condition , where, is a Lipschitz continuous function satisfying . Let the trajectory tracking error be with the initial condition . Considering (3) and (29), the tracking error dynamics is:
[TABLE]
Next, an augmented system is constructed as in [22],in the form with initial condition . The augmented dynamics based on (29) and (30) can be formulated as:
[TABLE]
where, F(.) and G(.) are the new system matrices.
IV-B Formulation of Control Policy
The infinite horizon HJB cost function for the system in (31) is given by,
[TABLE]
where, . and are positive definite matrices. The control input must be admissible so that cost function given by (32) is finite.
Applying the same methods of solving as in Section III-A, the control policy is obtained as,
[TABLE]
IV-C Neural Network control design
Here, is constructed by a single hidden layer neural network with a non-linear activation function:
[TABLE]
where, , is the ideal weight, is the activation function output, is the number of hidden neurons and is the function approximation error.
Using the same approach as in Section III-B, optimal control law and approximate control law may be obtained as,
[TABLE]
[TABLE]
Applying (36) into the augmented system dynamics (31), can be formulated as:
[TABLE]
The weight update law proposed for the critic of optimal tracking control is,
[TABLE]
where, is the learning rate of critic network and .
Remark 3: Assumption 1 and *Remark 1 * holds here. The augmented system states shall replace the system error states as follows ; For the augmented system (31), with cost function (32), let be a continuously differentiable Lyapunov function candidate satisfying,
[TABLE]
Then,there exists a positive definite matrix, ensuring,
[TABLE]
During the implementation,.
IV-D Stability Analysis
In this section, the stability of the system is investigated.
Remark 4: Assumption 2 holds here. However, the control coefficient matrix is modified. The assumption , where is a positive constant shall be used.
Lyapunov function,
[TABLE]
[TABLE]
Using the similar approach as in Section III-C,
[TABLE]
Applying the Remark 3 and Remark 4 here
[TABLE]
From this, we observe that if the condition below holds, then, is negative semidefinite implying that the system is stable in the sense of Lyapunov.
[TABLE]
Hence, and are both bounded. It may be expressed as :
Also,
[TABLE]
It may be noted that estimated controller input converges to the neighbourhood of the optimal control input as a finite bound exists just as in the case of regulation. It is also observed that the cost function is also bounded.
Using the same approach as in Section III-C, it may be shown that is bounded. Recalling Barbalat’s Lemma [25], it may be concluded that the system is stable and that the parameter estimation error and weight estimation error converge to zero.
V Results and Discussion
In this section, we consider the numerical simulations followed by real time experimental validations on a real 6 DOF UR 10 robot manipulator to demonstrate the effectiveness of the proposed kinematic control.
V-A Experimental Setup
Our experimental setup shown in Fig. 2 consists of a UR10 robot manipulator with its controller box/internal computer and a host PC/external computer. The UR10 robot manipulator is a 6 DOF robot arm designed to safely work alongside and in collaboration with a human. This arm can follow position commands like a traditional industrial robot, as well as take velocity commands to apply a given velocity in/around a specified axis. The low level robot controller is a program running on UR10’s internal computer broadcasting robot arm data , receiving and interpreting the commands and controlling the arm accordingly. There are several options for communicating with the robot low level controller to control the robot including the teach pendent or opening a TCP socket (C++/Python) on a host computer. We used open source C++ based UrDriver wrapper class integrated with ROS on a host PC to implement our proposed velocity based kinematic control scheme. The host PC streams joint velocity commands via URScript to the robot real time interface over Ethernet at . The driver was configured with necessary parameters like IP address of the robot at startup using ROS parameter server.
V-B Reaching a Fixed Position
Simulations are first performed to verify the kinematic control of the robot manipulator to reach a fixed position using the control law expressed in Equation (15). A typical simulation run generated with a random seed pose to a fixed target pose in cartesian space is shown in Figure 3(a).The weights were initialized randomly and , where, n=50, k=time instance, and .The predicted cost function was:
V-C Following a Time Varying Reference Trajectory
In this section, the kinematic control of the robot manipulator to follow a time varying reference trajectory is validated. Simulations are first performed using the control law derived in Equation (36). A typical simulation generated with a random seed pose and following a time varying reference trajectory moving at an angular speed of along a circle centered at with a radius of is shown in Figure 3(e). Weights were initialized randomly and , where, n=10, k=time intance, and . In this work, we take the predicted cost to be:
V-D Observations
After a short transient time, the error trajectory = converges to zero in both cases as shown in Fig. 3(b) and 3(f) respectively. Note that the control input remain within the maximum joint velocity limits shown in Fig. 3(c) and Fig. 3(g). However unlike in optimal regulation, it does not converge to zero as the velocity compensation is required for perfect tracking control. The time history plot of the associated cost function is shown in shown in Fig. 3(d,h). The experimental validation for the corresponding target pose was performed using a real UR 10 robot manipulator. The results from Fig. 4 shows that the end effector of UR 10 robot manipulator successfully reaches the target pose under the proposed control scheme and successfully follows the reference time varying circular trajectory starting from a random seed pose. The time history plot of the associated cost function is shown in Fig. 4(d,h). Unlike in simulations, the control effort shows some chatter due to the inertia of the real hardware.
V-E Quantitative Test Comparison
In order to quantify the performance of the proposed optimal kinematic control, an automated test process was used where a large, statistically-valid number of random samples were used as inputs. Kinematic models of Universal Robot (UR) 10 was used to demonstrate the tests. The quantitative test methodology for comparing the proposed method for kinematic control against the state-of-the-art kinematic control using an RNN [3] and Singluar Value Filtering (SVF) approach [26] is entailed.
First, a total of random samples and different feasible elliptical trajectories were generated in the sample space for regulation and tracking respectively. The sample space is a cuboid volume of task space within robot’s reach and every sample is a pair of seed pose and target pose in the case of regulation. The three kinematic control schemes mentioned above are tested and compared on trajectory cost.
The trajectory cost is defined as normalized total cost: , where for the same and and is the number of total sample instances. The design parameters were selected such that the maximum control effort is the same for all three approaches.
[TABLE]
\captionof
tableComparisons of different algorithms for Kinematic Control of UR10 Robot Manipulator The cost matrix contain two terms. One for the measure of optimal control action and another for smoothness of the motion. The comparison from Table I shows that the proposed kinematic control has the optimal control action than the state-of-the-art kinematic control.
VI Conclusion
In this work, we have designed an optimal kinematic controller for a robot manipulator using SNAC framework. A simple critic weight update law was proposed which ensured that the closed loop system becomes stable in the sense of Lyapunov while following an optimal trajectory. The robot was expected to reach a target position or follow a time varying trajectory in the task space while optimizing a global cost function. Using the proposed optimal regulation and optimal tracking framework in the context of SNAC, it has been experimentally demonstrated that robot performs desired tasks with optimal cost as evident from Table 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Kumar, L. Behera, and T. Mc Ginnity, “Kinematic control of a redundant manipulator using inverse-forward adaptive scheme with a ksom based hint generator,” Robotics and Autonomous Systems , vol. 58, no. 5, pp. 622–633, 2010.
- 2[2] P. Prem Kumar and L. Behera, “Visual servoing of a redundant manipulator with jacobian matrix estimation using self-organizing map,” Robotics and Autonomous Systems , vol. 58, no. 8, pp. 978–990, 2010.
- 3[3] S. Li, Y. Zhang, and L. Jin, “Kinematic control of redundant manipulators using neural networks,” IEEE transactions on neural networks and learning systems , vol. 28, no. 10, pp. 2243–2254, 2017.
- 4[4] I. Sirazuddin, L. Behera, T. Mc Ginnity, and S. Coleman, “Image based visual servoing of a 7 dof robot manipulator using an adaptive distributed fuzzy pd controller,” IEEE/ASME Trans on Mechatronics , vol. 19, no. 2, pp. 512–523, 2014.
- 5[5] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control . John Wiley & Sons, 2012.
- 6[6] S. Keerthi and E. Gilbert, “An existence theorem for discrete-time infinite-horizon optimal control problems,” IEEE Transactions on Automatic Control , vol. 30, no. 9, pp. 907–909, 1985.
- 7[7] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control . Athena scientific Belmont, MA, 2005, vol. 1, no. 3.
- 8[8] Z. Chen and S. Jagannathan, “Generalized hamilton–jacobi–bellman formulation-based neural network control of affine nonlinear discrete-time systems,” IEEE Transactions on Neural Networks , vol. 19, no. 1, pp. 90–106, 2008.
