Designing a Resilient Controller for Cancer Immunotherapy: Application to a Fractional‐Order Tumour‐Immune Model
Mohamadreza Homayounzade, Shayan Sajadian

TL;DR
This paper introduces a new control method for cancer immunotherapy that effectively eradicates tumors while maintaining immune cell levels.
Contribution
A novel robust control method using backstepping and deep reinforcement learning for cancer treatment is proposed.
Findings
The proposed control method achieves complete tumor eradication within 50 days.
The system maintains high levels of effector immune cells during treatment.
The controller demonstrates robustness against parametric uncertainty.
Abstract
In this paper, we propose a robust control method for the automatic treatment of targeted anti‐angiogenic molecular therapy based on multi‐input multi‐output (MIMO) nonlinear fractional and non‐fractional models using the backstepping (BS) approach. This protocol aims to eradicate tumour cells while preserving high levels of the body's natural effector cells and maintaining drug dosage within safe limits. The exponential stability of the controlled system is mathematically demonstrated using the Lyapunov theorem. Consequently, the tumour volume's convergence rate can be precisely controlled—a critical factor in cancer treatment. To fine‐tune the controller gains, a soft actor‐critic (SAC) algorithm within the framework of deep reinforcement learning (DRL) is employed, with a reward function designed based on the specific requirements of the system. Additionally, the Lyapunov theorem is…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3
FIGURE 4
FIGURE 5
FIGURE 6
FIGURE 7
FIGURE 8
FIGURE 9
FIGURE 10
FIGURE 11
FIGURE 12
FIGURE 13
FIGURE 14| Tumour growth model: |
concentration of effector cells concentration of tumour cells concentration of single cancer site |
| Input controller: |
control gains |
| Stability: |
‐ Exponentially stable, ‐ BIBO stable. |
| Tumour growth model: |
concentration of effector cells concentration of tumour cells concentration of single cancer site |
| Input controller: |
control gains |
| Stability: | ‐ Asymptotically stable |
| Proposed controller | Ref. [ | Ref. [ | Ref. [ | |
|---|---|---|---|---|
| Stability results | Exponential | Asymptotic | Asymptotic | Asymptotic |
| Robustness to uncertainties | ✓ | |||
| Requiring knowledge of disturbance | ✓ | ✓ | ✓ | |
| Time for tumour cells to converge to zero (days) | 50 | 280 | 300 | 150 |
| Remaining effector cells after therapy (cells/mL) | 27,000 | 19,900 | 20,150 | N/A |
| Parameter | Symbols | Values | |
|---|---|---|---|
| Antigenicity | |||
| Death rate of immune cells | |||
| The proliferation rate of immune cells | |||
| Half‐sat. for cancer clearance | |||
| Cancer growth rate | |||
| Logistic growth of cancer capacity | b | ||
| The half‐life of an effector molecule | |||
| The production rate of an effector molecule | |||
| Half sat. for proliferation term | |||
| Cancer clearance term | a | ||
| Half sat. of production |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Biology Tumor Growth · Gene Regulatory Network Analysis · Receptor Mechanisms and Signaling
Introduction
1
Cancer is one of the most predominant causes of death, with 19.3 million new cases globally, and is responsible for 10 million deaths in 2020 [1]. There are several therapy approaches to treat cancer, the most effective of which include immunotherapy, chemotherapy, radiotherapy, and surgery.
The development of mathematical models to represent tumour‐immune dynamics has gained significant interest in recent years, as these models offer a cost‐effective and safe method for testing hypotheses, simulating complex system behaviours, and assessing therapy outcomes. The goal is to develop a suitable protocol for drug administration based on the model, although tumour growth is an intricate process involving complicated interactions.
The ordinary differential equation (ODE) can be explored for modelling the interactions among immune cells, cancer cells, and drug dosage regimens. A mathematical model for tumour‐immune dynamics is presented in ref. [2], where four differential equations represent tumour growth, immune response, and chemotherapy treatment. Additional mathematical models have been proposed in refs. [3, 4, 5, 6, 7, 8]. The Kirschner–Panetta model [9], known for its simplicity, is widely recognised for describing tumour‐immune interactions and has been used to test various control methods and develop drug administration scenarios.
Various control strategies have been proposed to effectively kill cancer cells while minimising the negative side effects of drugs on healthy cells. A number of widely used control techniques include optimal control theory [10, 11], multiple model adaptive control [12], neuro‐fuzzy control [13], sliding mode control [14], and state‐dependent Riccati equation control [15, 16]. A mathematical fractional‐order model for tumour‐immune dynamics is presented in ref. [17], where an optimal control approach is implemented. The controller minimises the tumour size and provides only uniformly boundedness of tumour volume. A multi‐objective optimisation approach is studied in ref. [18] to determine the optimal drug administration protocol for cancer patients. The effects of optimal control theory on reducing cancer cells are investigated in ref. [19]. Despite its stability, tumour cells cannot be completely eradicated and exhibit a 100‐day recurrent cycle. A cancer treatment model based on nonlinear fractional optimal control problems is proposed in ref. [20], where generalised shifted Legendre polynomials are used to approximate the solutions of the model. An open‐loop and a closed‐loop optimal controller are proposed in ref. [21], based on the calculus of variations and the state‐dependent Riccati equation, respectively, to design an optimal drug injection protocol.
However, in both refs. [20, 21], the stability and robustness of the controlled system have not been analysed.
When dealing with complex dynamic systems such as cancer, the free‐model control method serves as a valuable approach. Although classification and detection can be achieved through model‐free approaches such as machine learning (ML) and artificial intelligence (AI) techniques [22, 23, 24, 25], these methods have certain constraints. Although they can produce closed‐loop solutions based on historical information, they fall short in monitoring the real‐time changes of various physiological parameters during a patient's treatment and also heavily rely on the availability of data. In ref. [26], adaptive fuzzy control has been used to determine the optimal chemoimmunotherapy treatment scenario. Fuzzy functions, however, are designed based on training data, which can be inaccurate and is not readily available. In addition, fuzzy logic controllers are based on predefined rules and, as a result, lack robustness against topological changes in the system, only being able to respond within the scope of their initial rule base. A neural adaptive control approach is pursued in ref. [27], where Nussbaum functions and neural networks are employed to address the problem of unknown control direction and uncertain nonlinear dynamics, respectively. A combination of immunotherapy and chemotherapy is proposed in ref. [28], where the Takagi‐Sugeno fuzzy controller is derived based on the Stepanova nonlinear model. However, tumour cells cannot be entirely eliminated. In ref. [29], a novel closed‐loop fuzzy logic controller based on intuitionistic fuzzy sets and an invasive weed optimisation algorithm has been proposed. A novel bounded‐gain‐forgetting composite adaptation law is used in ref. [30] to address the parameter uncertainty problem and a nonlinear composite adaptive control strategy has been implemented.
In ref. [31], using a recurrent neural network, a mathematical model of the cancer dynamics has been identified, and then the sliding mode control approach was employed to design the controller. However, the controller suffers from chattering. Optimal sliding mode control (SMC) is applied in ref. [32] to optimise drug delivery in chemotherapy, where the effects of obesity are also considered. Two non‐linear controllers using feedback linearisation and sliding mode approaches are proposed in ref. [33]. In addition, a genetic algorithm and a fuzzy tuner were employed to optimise the therapy parameters. The controller, however, lacks robustness against uncertainties and achieves only basic stability, which is considered the weakest form of stability.
Another powerful control approach widely used by researchers is the backstepping control method, which provides a systematic and flexible procedure for designing controllers for nonlinear system models. In ref. [34], for the multiple‐input multiple‐output (MIMO) cancer immunotherapy system, an adaptive fuzzy backstepping controller was proposed. In ref. [35], a model‐free controller, using an adaptive backstepping control approach and Legendre polynomials, was designed for cancer treatment; in contrast, Laguerre polynomials were utilised to deal with the uncertainty estimations in ref. [36].
In ref. [37], a mixed chemotherapy‐immunotherapy treatment protocol was developed by implementing a pseudo‐spectral control method. In ref. [38], a backstepping controller is combined with a sliding mode approach to develop the cancer treatment regimen.
A systematic review of 105 studies on optic pathway glioma management, presented in ref. [39], analysed patient demographics, treatment types, outcomes, and complications, highlighting significant limitations related to sample size, study heterogeneity, and challenges in assessing paediatric visual outcomes. An optimised approach utilising generalised Lerch polynomials is presented in ref. [40] to solve the fractional model of CAR‐T cell therapy for T‐cell leukaemia. This method effectively predicts a decrease in leukaemic T‐cells and suggests potential for enhanced treatment through sequential CAR‐T cell doses. In ref. [41], the development of a positive input dynamics method for tumour growth control was presented.
Optimal control theory is applied in ref. [42] to develop personalised cancer treatment plans, comparing strategies, such as Interior Point OPTimizer, State‐Dependent Riccati Equation, and Active Set Range Estimation to optimise chemotherapy and immunotherapy integration, aiming to minimise side effects and maximise therapeutic efficacy while considering individual patient differences. The mathematical modelling of cancer treatment is explored in ref. [43], focusing on chemotherapy as an optimal control problem to maximise tumour cell death while preserving normal cells, incorporating pharmacokinetics and validating results numerically to address drug resistance and toxicity challenges. Tumour‐immune cell growth is investigated in ref. [44] under chemotherapy drug diffusion using a Caputo fractional‐order model, employing the modified homotopy perturbation method to analyse spatiotemporal dynamics and highlighting the significant role of fractional order in understanding tumour‐immune interactions. In ref. [45] a novel control system combining a delayed‐output state observer and a backstepping‐based feedback controller is designed to address instability from measurement delays and positive input constraints in tumour growth control.
In light of the above‐mentioned problems, in this paper we develop a robust control method for the automatic treatment of targeted anti‐angiogenic molecular therapy based on multi‐input multi‐output (MIMO) nonlinear fractional and non‐fractional models using the backstepping (BS) approach. The BS approach is a recursive design procedure for stabilising the origin of a system in strict‐feedback form and guarantees the global asymptotic stability of feedback systems. The exponential stability of the controlled system is proved mathematically using the Lyapunov theorem. As a result, the convergence rate of the tumour volume can be controlled, which is a very important issue in cancer treatment. Moreover, the robustness of the system against parametric uncertainties is verified mathematically using the Lyapunov theorem. The main novelties of the proposed method are as follows:
- Unlike refs. [22, 23, 24, 25, 26, 27, 28, 29, 31, 34], which rely on data‐intensive machine learning models and rule‐based fuzzy approaches—both of which may face challenges in immunotherapy applications due to limited data availability and lack of robustness to changes and uncertainties—our approach leverages a backstepping (BS) method. This systematic, recursive design process ensures stability and robustness without the dependency on large datasets. In addition, unlike these model‐free approaches, the backstepping method is based on mathematical models, providing more physiological details that help physicians make more informed decisions.
- Unlike refs. [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38], where stability analysis is either overlooked or the proposed controller is asymptotically stable, our stability analysis is conducted mathematically using the Lyapunov theorem, proving that the controlled system is exponentially stable. This exponential stability enables more predictable convergence, allowing the time required for tumour cell eradication to be precisely determined and controlled. Consequently, clinicians can plan treatment timelines more effectively, ensuring a more targeted and reliable approach to cancer therapy.
- Unlike refs. [10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 30, 31, 32, 33, 34, 35, 36, 37], which only considers the integer model of cancer dynamics, our approach applies both integer and fractional order models. Given that cancer dynamics exhibit hereditary properties and may have a lower diffusion rate, fractional order models can more effectively represent these dynamics. Additionally, the stability of the fractional order model has been mathematically proven using the Lyapunov theorem.
- The system's robustness is either analysed solely through simulations or disregarded completely in refs. [18, 19, 20, 21, 26, 27, 28, 29, 35, 37]. In this paper, we conduct a rigorous mathematical analysis of the system's robustness using the Lyapunov theorem, demonstrating that the controlled system is bounded input‐bounded output (BIBO) stable when subjected to perturbations. Additionally, the robustness of the proposed controller is investigated through simulations, which examine 10%–100% uncertainties in the system parameters. The simulation results confirm that the closed‐loop system remains robust against parametric uncertainties.
- In this article, the proposed immunotherapy protocol completely eradicates the tumour cell concentration in approximately 50 days, whereas the protocols suggested by refs. [34, 38] require roughly 300 days to achieve similar results. In addition, unlike ref. [35], where it takes 150 days to eliminate the cancer cells and the controller fails to converge to zero for drug delivery—resulting in a permanent drug injection rate of 650 cells/mL—our controller demonstrates superior long‐term performance, with both tumour cells and drug delivery converging to zero within 50 days.
- Leveraging the training capabilities of deep neural networks (DNN), a DRL‐based tuning mechanism is implemented to optimise the proposed backstepping controller.
The rest of the paper is organised as follows: in Section 2, the tumour growth model is presented. In Section 3, the controller is presented. In Section 4, the stability of the equilibrium point and the BIBO (bounded input‐bounded output) stability of the system exposed to external disturbances and uncertainties are analysed mathematically using the Lyapunov theorem. In Section 5, the control method is applied to the fractional order model, and in Section 6, simulation results are presented. Finally, this paper is concluded in Section 7.
Tumour Growth Model
2
The tumour immune‐tumour model is presented in subsection 2.1, and the error systems are presented in subsection 2.2.
System Modelling
2.1
The differential equations governing the tumour immune‐tumour model are derived from the following [9, 35]:
in which c is the antigenicity of the tumour, μ2 is the immune cells' death rate, g1 is the half‐sat for proliferation term, g2 is the half‐sat for cancer clearance, b is the logistic growth of cancer capacity, μ3 is the effector molecule's half‐life, r2 is the growth rate of cancer, p1 is the immune cells' proliferation rate, p2 is the effector molecules' production rate, a is the cancer clearance term, and g3 is the half‐sat of production. Moreover, the model illustrates the mutual interactions among three key elements: the concentration of activated immune cells (x), often referred to as effector cells—such as cytotoxic T‐cells, macrophages, and natural killer cells that are toxic to tumour cells—the concentration of tumour cells (y), and the level of Interleukin‐2 (z). In addition, u1 is a treatment term which represents an external source of effector cells such as lymphokine‐activated killer (LAK), and u2 shows the influence of tumour infiltrating lymphocyte cells [9, 35].
Error Systems
2.2
In this subsection, first, the design procedure of the controller using the BS technique is described, and then the proposed controller is presented in Theorem 1.
Let us define the variable e by the following:
where x∗ represents the desired magnitude of x which will be designed properly in Section 3 through the BS approach. Considering the definition of e by Equation (2), we can rewrite Equation (1b) as follows:
As a result, x∗ can act as a guiding variable for controlling state y. In Section 3, the desired magnitude x∗ is designed properly using the BS approach such that the state y asymptotically converges to zero if x tends to x∗.
Differentiating definition (2), we obtain the following:
Substituting Equation (1a) in the result, we obtain the following:
As a result, the control input should be designed in a way that the state e converges to zero, that is, x tends to x∗.
Remark 1The controller method using the BS technique is as follows:
- The desired variable x∗ is appropriately designed such that the state y exponentially converges to zero if x tends to x∗.
- The input control law u1 is appropriately designed so that the tracking error e exponentially converges to zero, that is, x tends to x∗.
- The input control law u2 is appropriately designed so that the state z exponentially converges to zero.
Controller Design
3
The BS technique is used to design the desired magnitude x∗ as follows:
where the constant k1 is the positive control gain. Clearly Equation (6) can be rearranged as follows:
where
As mentioned in Remark 1, the costate x∗ is designed so that the state y exponentially converges to zero when x approaches x∗.
Substituting x∗ by Equation (6) in Equation (3), we obtain the following equation:
Differentiating Equation (7) with respect to time, we obtain the following equation:
Substituting Equation (9) in Equation (10), we obtain the following equation:
Substituting Equations (1a) and Equation (11) in Equation (4), we obtain the following equation:
Let us design the control input u1 by the following equation:
where A,B are previously defined by Equation (8) and k1,k2 represent control gains.
The control input law u1 is designed using the feedback linearisation technique to ensure that the tracking error e exponentially converges to zero. A detailed explanation of the feedback linearisation technique is provided in Remark 2.
Substituting Equation (13) in Equation (12), we obtain the following equation:
Let us design the control input u2 by the following equation:
where k3 represents the control gain.
The input control law u2 is designed using the feedback linearisation technique to ensure that the state z exponentially converges to zero. A detailed explanation of the feedback linearisation technique is provided in Remark 2.
Substituting Equation (15) in Equation (1c), we obtain the following equation:
Remark 2Feedback linearisation is a nonlinear control technique that transforms a nonlinear system into an equivalent linear system by applying a nonlinear state and input transformation. The core idea is to algebraically cancel the system's nonlinearities, enabling the use of conventional linear control methods. This is achieved by differentiating the system output until the control input explicitly appears, allowing for the design of a linearising control law. By doing so, the closed‐loop system behaves like a fully or partially linear system, simplifying stability and tracking control. While highly effective for precise control applications, feedback linearisation relies on an accurate system model and can be sensitive to uncertainties and external disturbances.
The schematic of the closed‐loop system is presented in Figure 1.
The schematic of the controlled system.
Theorem 1 The control inputs derived from Equations (13) and (15) exponentially stabilise the system, ensuring that the states y and z , as well as the costate e , exponentially converge to zero, that is, |y(t)|≤βe−12γt,|z(t)|≤βe−12γt, |e(t)|≤βe−12γt, where the positive constant γ adjusts the rate of convergence of system errors.
Remark 3It will be shown that by increasing the control gain k1,k2,k3, and/or decreasing the magnitude of α the rate of convergence of tumour volumes and inhibitor level decreases.
Stability Analysis
4
In subsection 4.1, we analyse the stability of the equilibrium point. Then, the BIBO stability of the controlled system in the presence of parametric uncertainties is analysed through the Lyapunov theorem. Consequently, the robustness of the system is assured using a high gain controller. The proposed BS controller is summarised in Table 1.
Using a similar approach as in Section 3, we can write the system error dynamics in the presence of uncertainties in system parameters and time varying disturbance as follows:
where d1,d2,d3 represent the deviation of system parameters from their nominal values.
Considering definition (26) and Equations ((17a), (17b), (17c)), in the presence of uncertainties, the closed‐loop error systems can be rearranged as follows:
where
The perturbation term ud(t,X) could result from modelling errors, ageing, or uncertainties and disturbances, which exists in any realistic problem. In a typical situation, we do not know ud(t,X), but we know some information about it, like knowing an upper bound on ‖ud(t,X)‖. Here, we represent the perturbation as an additive term on the right‐hand side of the state equation. Uncertainties that do not change the system's order can always be represented in this form. For if the perturbated right‐hand side is some function f∼(X), then by adding and subtracting f(X), we can rewrite the right‐hand side as f∼(X)=f(X)+f∼(X)−f(X) and define ud(t,X)=f∼(X)−f(X).
In Section 4.1, it is shown that the nominal system (18) has a uniformly exponentially stable equilibrium point at the origin. In Section 4.2, it is shown that in the case ud(t,0)≠0, the origin will not be an equilibrium point of the perturbed system. In this case, we study BIBO stability of the system.
Exponential Stability of Nominal System
4.1
If ud(t,0)=0 (in the presence of certain parameters) the perturbed system (18) has an equilibrium point at the origin. The main results regarding this research are mentioned in the form of the following theorem. In this case, we analyse the stability behaviour of the origin as an equilibrium point of the perturbed system.
Remark 4The Lyapunov theorem provides a robust framework for analysing the stability of an equilibrium point in dynamic systems. It is particularly advantageous for nonlinear systems, as it does not require linearisation or small‐signal approximations. By constructing a Lyapunov function—an energy‐like measure—it enables the assessment of both local and global stability. The theorem provides explicit and systematic conditions for stability, offering clarity in analysis. Moreover, it accounts for uncertainties and disturbances, ensuring robust stability. Overall, the Lyapunov theorem is a powerful tool for evaluating and ensuring the stability of equilibrium points in diverse and complex systems.
The control inputs By Equations (13) and (15), exponentially stabilise the equilibrium point of the nominal system, that is
Select the Lyapunov function
Differentiating Equation (21) with respect to time to obtain
Substituting Equations (9), (14), (16) in Equation (22) to obtain
Simplifying equal terms in Equation (23) we obtain the following equation:
Equation (24) ensures V˙≤0. Considering that
where
Definition. Let us define the Euclidean norm for an arbitrary vector X as
Considering that V is a positive definite function we have
where λ1,λ2 represent respectively the minimum and maximum eigenvalues of matrix M.
Considering Equation (24) we have the following equation:
where γ=mink1,k2,k3 is a positive constant.
Integrating both sides of Equation (29) to find
as well
Considering Equation (28), Equation (29) can be considered as follows:
Considering definitions (19a) and Equation (32) results
where β is a positive constant. Equation (33) implies that states y(t),e(t) and z(t) exponentially converge to zero, that is limt→∞y(t)=0,limt→∞z(t)=0,limt→∞e(t)=0
Remark 5Considering Equation (33) it can be observed that the parameter γ specifies the rate of convergence of y(t),z(t). Consequently, by adjusting the control gains k1,k2,k3, we can adjust the parameter γ and as a consequence, we can control the rate of convergence of states y and z.
Remark 6Considering x∗ from Equation (7) and the fact that limt→∞y(t)=0, it follows that x∗(t) tends to D as time increases. Specifically, x∗(t) approaches αr2ag2+k1ag2 over time. The parameters r2,a,g2 are inherent system parameters and cannot be directly designed. However, by selecting an appropriate value for the constant α, the steady‐state magnitude of the state x and the costate x∗(t) can be regulated. By examining the definition of u1 and u2 in Equations (13) and (15) and noting limt→∞y(t)=0, limt→∞e(t)=0 and limt→∞z(t)=0, it can be concluded that the steady‐state magnitude of the control input u2 converges to zero. Furthermore, the magnitude of u1 can be reduced by increasing α and decreasing the control parameter k1. However, as mentioned in Remark 2, increasing α and decreasing k1 will also result in a slower rate of convergence. Therefore, a trade off must be made between the rate of convergence and the allowable magnitude of the input u1. In Section 5.1, we employ an optimsation technique to determine the optimal control gains.
Robustness of the System Against Parametric Uncertainties and Time‐Varying Disturbance
4.2
Let us turn now to the more general case when we do not know that ud(t,0)≠0. The origin X=0 may not be an equilibrium point of the perturbed system (18), nor should we expect the solution of the perturbed system to approach the origin as t→∞. The main results regarding this research are mentioned in the form of the following theorem. This subsection examines how the closed‐loop system behaves when the system parameters are uncertain. The nominal value of system parameters is denoted by the “” notation.
Theorem 2 Let X=0 be an exponentially stable equilibrium point of the nominal system X˙=f(X) . Let V(X) be a Lyapunov function of the nominal system that satisfies (36) in D , where D=X∈R3|‖X‖<r . Suppose the perturbation term ud(t,X) satisfies
For the tumour growth system modelled by Equations ((1a), (1b), (1c)), the controllers are given by the following equation:
guarantees that the system is BIBO stable in the presence of uncertainties and time varying disturbance.
We use V(x) calculated by Equation (21) as a Lyapunov function for the perturbed system (18) which satisfies
For all X∈D for some positive constants λ1,λ2,λ3 and λ4.
The derivative of V(X) along the trajectories of (18) satisfies
Choosing θ<1, we have
where μ=δλ4/θλ3.
It is evident from the first of Equation (37) that if V≥λ2μ2, then ‖X‖≥μ, and
Hence
The above inequality remains valid over the interval t0,t0+T, provided V≥λ2μ2. Additionally, for t≥t0+t, we obtain
For t≥0, any X∈D, and a positive constant θ<1, it follows that for all ‖Xt0‖<λ1/λ2r, the solution X(t) of the perturbed system (18) fulfils the condition
and
Remark 7The ultimate magnitude of ‖X(t)‖, that is b, can be decreased by increasing the magnitude of control gains.
For a certain finite T, such that
Consider W=V(X), when V(x)≠0, use W˙=V˙2V and Equation (37) and consider Equation (36) to obtain the following equation:
When V(X,t)=0, the right derivative of W is calculated by using the following equation:
Also, we have
When V(X,t)=0, considering Equation (18) we obtain the following equation:
Consequently, we obtain the following equation:
Considering Equations (47) and (49), we have the following equation:
In other words
Thus
Therefore, for any V(X(t),t), we get
Using the comparison lemma, it follows that
Applying (25), we derive the following:
One can readily confirm that
Guarantee that ‖X(t)‖≤r, ensuring X(t) remains within the valid domain of the assumptions.
This outcome can once again be interpreted as a robustness characteristic of nominal systems with exponentially stable equilibria at the origin. It demonstrates that arbitrarily small, uniformly bounded perturbations do not cause significant steady‐state deviations from the origin. A high gain controller ensures the system's robustness. Table 1 presents the BS controller that we propose.
Remark 8BIBO stability ensures that for any bounded input, the system produces a bounded output, making it a fundamental criterion for practical system performance. This type of stability guarantees predictable and controlled system behaviour, which is critical in real‐world applications where inputs are often constrained. BIBO stability is particularly useful for analysing and designing systems subject to external disturbances, ensuring that the system remains well‐behaved under such conditions. It provides a straightforward and intuitive stability measure, especially for systems in control engineering and signal processing, ensuring reliable and safe operation.
Application to Fractional Order Model
5
In this section, because of the advantages of fractional order systems, the application of the method to Kirschner's fractional‐order model is shown. It is proved that the controller ensures stability of the system in the presence of fractional model. Therefore, its fractional model is expressed in the following equation:
where q is a fractional‐order operator set to 0.98 in simulations, u1 is an external source of effector cells, u2 represents an external input of effector cells.
Theorem 3 The control input calculated by Equations (13) and (15) stabilise the tumour growth system modelled by fractional order such that the states y,z and e asymptotically converge to zero.
Considering the definition of e by Equation (2), we can rewrite Eq. (57b) as follows:
Substituting x∗ by Equation (6) in Equation (58), we obtain the following equation:
Fractional time derivative of definition (2), results
Substituting Equation (57a) in the result, we obtain the following equation:
Fractional time derivative of x∗ by Equation (7) results
where A,B are calculated by Equation (8). Substituting Equation (59) in Equation (62), we obtain the following equation:
Substituting Equation (62) in Equation (61), we obtain the following equation:
Substituting the control input u1 by Equation (13) in Equation (64), we obtain the following equation:
Substituting the control input u2 by Equation (15) in Equation (57c), we obtain the following equation:
Choose the following Lyapunov function candidate:
Fractional order derivative of Equation (67) results
Substituting Equations (59), (65) and (66) in Equation (68) and simplifying the result, we obtain the following equation:
Equation (69) ensures DqV≤0. Based on the above relation, the proposed control method ensures stability of the closed‐loop system. Moreover, the error of the outputs converges to zero asymptotically.
The proposed BS controller is summarised in Table 2.
Simulation Results
6
In this section, the effectiveness of the presented controller is examined numerically. All simulations were performed in MATLAB software. Simulation results of similar research papers in the literature are presented for comparison. The design of a BS controller based on DRL is presented in Section 6.1. Furthermore, the exponential stability of our approach allows us to adjust the convergence rate (γ) of the controlled system. In Section 6.2, the results presented in this paper are juxtaposed with the results presented in ref. [35]. In Section 6.3, we compare the proposed results with the method presented in ref. [38] and the adaptive fuzzy controller presented in ref. [34]. Furthermore, we evaluate the robustness of the presented controller against uncertainties.
SAC‐Driven Parameter Tuning for BS Controller Development
6.1
The Markov Decision Process (MDP) serves as a widely used framework to structure reinforcement learning (RL) agents. It is characterised by a four‐element tuple, which defines its core components.
- State st represents a collection of variables that define the system at time step t.
- Action δt represents the control signal applied at time step t.
- Transition Probabilities ρ indicate the likelihood of the environment moving from the current state st to the next state st′ when action δt is applied.
- Reward rt indicates how close the current state is to the target state at time step t.
In the reinforcement learning framework, the primary objective is to maximise the expected cumulative reward, E∑k=0Tγkrt+k, accumulated from the environment over a time horizon T. The parameter γ∈[0,1] serves as the discount factor.
Soft Actor‐Critic Approach
6.1.1
The Soft Actor‐Critic (SAC) algorithm, a leading method in deep reinforcement learning, effectively addresses high‐dimensional continuous action spaces through the use of soft policy iteration. Unlike traditional actor‐critic approaches, SAC maximises the entropy of the policy independently from the cumulative reward, allowing for more robust exploration and stability.
where H(·|s) represents entropy, π∅ is the policy following the distribution ∅, and α is an adjustable parameter.
The SAC architecture, shown in Figure 2, consists of three key components: a critic network that estimates the Q‐value, an actor (policy) network that generates action signals, and an experience buffer for storing tuples of state transitions (D=st,δt,rt,st+1) used for training the networks. Additionally, a target Q network, which mirrors the structure of the Q network, is employed to enhance stability.
Structure of the BS controller developed using the SAC‐DRL algorithm for cancer immunotherapy.
Using the tuples from the replay buffer D, the Q network can be updated with the following equation:
where Vψ^st+1 represents the target network.
For the policy function, the actor network's weights are adjusted using Kullback–Leibler (KL) divergence, expressed as follows:
where DKL(·) represents the KL divergence and Ζπoldst denotes the logarithm of the partition function, which serves to normalise the distribution. By minimising the Kullback–Leibler divergence and disregarding the term Ζπoldst due to its minimal effect on the weight update for φ, the policy parameters are optimised as follows:
Utilising a Gaussian distribution, the actions in SAC are generated through the following policy function:
where εt∼N(o,I).
Subsequently, the updated policy function is defined as follows:
SAC‐DRL Utilising a BS Controller
6.1.2
In this application, the controller gains are treated as the optimisation targets for the SAC algorithm. The three control coefficients that need to be refined are as follows:
Given that the primary control objective is to stabilise the system, the external source of effector cells (u1(t)) and the influence of tumour infiltrating lymphocyte cells (u2(t)), and the concentration of tumour cells y(t) are considered as part of the system state.
In the context of the optimisation problem, the reward function is formulated as follows:
Here, σ1 and σ2 represent the weight coefficients. As specified, |u(t)−URef(t)| indicates that the SAC agent achieves u(t) a higher reward when the values of are closer to the nominal system input. Based on the term −σ2|u(t)−URef(t)|, the SAC agent incurs a significant penalty when u(t) deviates substantially from the nominal value, while only a minor penalty is applied for smaller deviations in u(t).
Comparison of the Results
6.2
The results of simulations between the proposed backstepping controller and the controller proposed in [35] are shown in Figures 3, 4, 5, 6. As it can be seen in Figure 3a, the concentration of the tumour cells in [35] has reached zero after a span of roughly 150 days; in contrast, Figure 3b shows a more effective eradication of the tumour cells after a span of 50 days. In addition to earlier elimination of the tumour cells, it can be observed in Figure 4b that IL‐2 has reached zero after a span of 40 days, which is more efficient compared to Figure 4a. The initial conditions for the system states are set as
Concentration of tumour cells: (a) represents concentration of tumour cells presented in [35], and (b) depicts concentration of the tumour cells presented in this paper.
Concentration of single cancer site: (a) represents concentration of single cancer site presented in ref. [35], and (b) depicts concentration of single cancer site presented in this paper.
Time history of external source of effector cells: (a) represents external source of effector cells presented in ref. [35], and (b) shows external source of effector cells presented in this paper.
Time history of tumour infiltrating lymphocyte cells: (a) tumour infiltrating lymphocyte cells presented in ref. [35], and (b) tumour infiltrating lymphocyte cells presented in this paper.
Consequently, the initial magnitude of u1,u2 is determined based on the control inputs. Since the control methods employed in the proposed approach and [35] differ, the initial magnitudes of u1 and u2 will also differ.
The backstepping controller curves of u1 and u2 are shown in Figures 5 and 6, respectively. In Figure 5a, the external source of effector cells settles at around 80 days, similar to our controller's response shown in Figure 5b. However, Figure 5a exhibits more oscillations in the transient response, while our model has a smoother response with fewer oscillations. Moreover, Figure 6b demonstrates that u2, after a short amount of time, has reached zero in a way that will efficiently lower the cost, whereas Figure 6a shows the inability of u2 to reach zero. As mentioned in Remark 6, as the states y and z, along with the costate e, converge to zero, the control input u2 tends to zero, and the steady‐state magnitude of u1 can be sufficiently reduced by properly tuning the control gains. The continuous and permanent application of an external source of effector cells to the system, without converging to zero, poses significant risks to patients and is not practical, as indicated by [46]. In this paper, we demonstrate that by appropriately assigning the gain α, we can reduce the magnitude of the external source of effector cells. For instance, in Figure 7a,b, we numerically analyse the effects of parameter α on the convergence rate of tumour cell concentration and the steady‐state magnitude of the external source of effector cells, as discussed in Remark 6.
Effect of parameter α on control performance: (a) tumour cell concentration, and (b) external source of effector cells.
Evaluation of Comparative Results
6.3
In ref. [38], Sarhaddi and Yaghoobi compared their method with the integer‐order model presented in [34]. It has been shown that the fractional model could be a better mathematical model for cancer dynamics compared to the integer‐order model because fractional derivatives have hereditary and memory properties of many physical phenomena. In this paper, for modelling the reciprocal effects of the tumour and immune system, the fractional‐order differential equations have also been used because of their aforementioned memory properties. A comparative display of the simulation results is presented in Figures 8, 9, 10. In Figure 8a, the concentration of effector cells in both models takes up to around 280–300 days to settle. In contrast, the concentration of effector cells in our model, shown in Figure 8b, reaches a fixed value after roughly 70 days. In Figure 9b, it can be observed that, using the presented controller, the tumour cells have reached zero after a span of 80 days, whereas Figure 9a shows that it takes nearly four times longer for tumour cells to reach zero compared to Figure 9b. Moreover, while it takes more than 350 days for IL‐2 cells of both models in Figure 10a to reach zero, Figure 10b shows that it has reached zero only after 40 days.
Concentration of the effector cells: (a) comparative demonstration of the concentration of the effector cells presented in ref. [38] and presented in ref. [34], and (b) concentration of the effector cells presented in this paper.
Concentration of the tumour cells: (a) comparative demonstration of the concentration of the tumour cells presented in ref. [38] and presented in ref. [34], and (b) concentration of the tumour cells presented in this paper.
Concentration of single cancer site: (a) comparative demonstration of the concentration of single cancer site presented in ref. [38] and presented in ref. [34] and (b) concentration of single cancer site presented in this paper.
In Table 3, results from refs. [34, 35, 38] and this paper are compared. As illustrated in the dynamic responses of Figures 8, 9, 10, the presented BS‐based controller demonstrates superior control performance, achieving quicker settling times for the dynamic responses. Although theoretically and numerically our immunotherapy protocol demonstrates strong performance, with the controlled system being exponentially stable and robust against uncertainties, clinical trials using this protocol could be conducted to fully understand its efficacy in real‐world scenarios.
Stability and Robustness Analysis Considering Uncertainties, Measurement Delays, and Discretisation
6.4
All system parameters used in the simulation are listed in Table 4. Kirschner et al. [9] derived these parameters by drawing data from medical literature and performing sensitivity analysis. For additional details on parameter derivation, please refer to [9]. To demonstrate the robustness of the system against parameter uncertainties, we simulate the proposed controller against 10%–100% uncertainties in the system parameters. The time history of the resulting tumour volume, inhibitor level, and injection rate are shown individually in Figures 11, 12, 13, 14. It can be seen that the controller is robust against uncertainties in system parameters. In addition, it can be observed that a higher control injection rate is needed to control the system against more uncertainties in the system parameters.
Concentration of tumour cells with 10%–100% system parameter uncertainties.
Time history of external source of effector cells with 10%–100% system parameter uncertainties.
Time history of external source of effector cells with 10%–100% system parameter uncertainties.
Effect of discretisation in the control performance: (a) a comparative demonstration of the concentration of the effector cells and (b) external source of effector cells.
In practice, the controller operates based on discrete measurements of system output as given by the following equation:
where xi=xti,yi=yti,zi=zti, represent the discrete state observations and u1i=u1ti,u2i=u2ti, represent the discrete inputs. The time instances are given by ti=t0+iΔt, where t0 is the initial time and Δt represents the time increment. The designed control inputs incorporate the system state variables, meaning that these variables must be continuously measured and fed back to the controller. To assess the robustness of the proposed method under discrete state measurements, we analyse its effectiveness in Figure 14a,b, considering the impact of discrete system state observations.
Remark 9In cancer control, activated immune cell concentrations are measured via blood tests, flow cytometry, immunohistochemistry (IHC), biopsy analysis, molecular/genetic techniques, and imaging. Tumour cell quantification, essential for diagnosis, treatment monitoring, and disease progression, employs biopsy/histopathology, flow cytometry, imaging, and molecular methods. Interleukin‐2 (IL‐2), a key cytokine for immune regulation and T cell activation, is assessed using ELISA, multiplex immunoassays, and flow cytometry to evaluate immune responses, immunotherapy efficacy, and inflammation. These methods collectively provide insights into immune activity, tumour burden, and cytokine dynamics in cancer research and management.
Conclusions
7
In this paper, a novel robust controller has been formulated for stabilising MIMO cancer immunotherapy with application to a tumour‐immunity fractional order model. Unlike previous studies, the controlled system is proved to be exponentially stable using the Lyapunov theorem. Because of the exponential stability, the convergence rate of the tumour volume can be controlled by adjusting the control gains. Furthermore, we mathematically demonstrate bounded input‐bounded output (BIBO) stability using the Lyapunov theorem and confirm the system's robustness against uncertainties through simulations with system parametric uncertainties ranging from 10% to 100%, proving that the system remains robust and stable. The simulation results show the effectiveness and superiority of the proposed controller compared to previous related studies, with tumour cells entirely eradicated within 50 days. Future studies could explore the efficacy of the proposed controller across other cancer therapy modalities.
Author Contributions
Mohamadreza Homayounzade: conceptualization, formal analysis, investigation, methodology, project administration, resources, software, validation, supervision, visualization, writing – original draft, writing – review and editing. Shayan Sajadian: formal analysis, investigation, methodology, software, validation, data curation, visualization, writing – original draft, writing – review and editing.
Conflicts of Interest
The authors declare no conflicts of interest.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1H. Sung , J. Ferlay , R. L. Siegel , et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” CA: A Cancer Journal for Clinicians 71, no. 3 (2021): 209–249, 10.3322/caac.21660.33538338 · doi ↗ · pubmed ↗
- 2L. G. De Pillis and A. Radunskaya , “A Mathematical Tumor Model With Immune Resistance and Drug Therapy: An Optimal Control Approach,” Computational and Mathematical Methods in Medicine 3, no. 2 (2001): 79–100, 10.1080/10273660108833067. · doi ↗
- 3M. Mamat , A. K. Subiyanto , and A. Kartono , “Mathematical Model of Cancer Treatments Using Immunotherapy, Chemotherapy and Biochemotherapy,” Applied Mathematical Sciences 7 (2013): 247–261, 10.12988/ams.2013.13023. · doi ↗
- 4O. G. Isaeva and V. A. Osipov , “Different Strategies for Cancer Treatment: Mathematical Modelling,” Computational and Mathematical Methods in Medicine 10, no. 4 (2009): 253–272, 10.1080/17486700802536054. · doi ↗
- 5F. A. Rihan and G. Velmurugan , “Dynamics of Fractional‐Order Delay Differential Model for Tumor‐Immune System,” Chaos, Solitons and Fractals 132 (2020): 109592, 10.1016/j.chaos.2019.109592. · doi ↗
- 6H. Hassani , Z. Avazzadeh , P. Agarwal , et al., “A Study on Fractional Tumor‐Immune Interaction Model Related to Lung Cancer via Generalized Laguerre Polynomials,” BMC Medical Research Methodology 23, no. 1 (2023): 189, 10.1186/s 12874-023-02006-3.37605131 PMC 10440950 · doi ↗ · pubmed ↗
- 7S. Bunimovich‐Mendrazitsky , H. Byrne , and L. Stone , “Mathematical Model of Pulsed Immunotherapy for Superficial Bladder Cancer,” Bulletin of Mathematical Biology 70, no. 7 (2008): 2055–2076, 10.1007/s 11538-008-9344-z.18716846 · doi ↗ · pubmed ↗
- 8J. Yang , S. Tang , and R. A. Cheke , “Modelling Pulsed Immunotherapy of Tumour–Immune Interaction,” Mathematics and Computers in Simulation 109 (2015): 92–112, 10.1016/j.matcom.2014.09.001. · doi ↗
