D-Optimal Input Design for Nonlinear FIR-type Systems:A Dispersion-based Approach
Alexander De Cock, Michel Gevers, Johan Schoukens

TL;DR
This paper introduces a convex optimization approach using dispersion-based methods to design D-optimal inputs for nonlinear FIR-type systems, ensuring realizability and efficient computation.
Contribution
It presents a novel dispersion-based optimization scheme for D-optimal input design in nonlinear FIR systems, with a graph-based method for realizability.
Findings
The proposed method converges monotonically to the optimal solution.
The approach is computationally faster than general convex optimizers.
Numerical examples demonstrate the effectiveness of the design.
Abstract
Optimal input design is an important step of the identification process in order to reduce the model variance. In this work a D-optimal input design method for finite-impulse-response-type nonlinear systems is presented. The optimization of the determinant of the Fisher information matrix is expressed as a convex optimization problem. This problem is then solved using a dispersion-based optimization scheme, which is easy to implement and converges monotonically to the optimal solution. Without constraints, the optimal design cannot be realized as a time sequence. By imposing that the design should lie in the subspace described by a symmetric and non-overlapping set, a realizable design is found. A graph-based method is used in order to find a time sequence that realizes this optimal constrained design. These methods are illustrated on a numerical example of which the results are…
| index | subsequence | frequency |
|---|---|---|
| 1 | [-1;-1] | 1/6 |
| 3 | [-1;-10/18] | 1/6 |
| 40 | [-1/3,1] | 1/6 |
| 61 | [1/3;-1] | 1/6 |
| 98 | [1;10/18] | 1/6 |
| 100 | [1;1] | 1/6 |
| index | subsequence | frequency |
|---|---|---|
| 1 | [-1;-1] | 0.15 |
| 4 | [-1;-1/3] | 0.13 |
| 10 | [-1,1] | 0.09 |
| 31 | [-1/3;-1] | 0.13 |
| 70 | [1/3;1] | 0.13 |
| 91 | [1;-1] | 0.09 |
| 97 | [1;1/3] | 0.13 |
| 100 | [1;1] | 0.15 |
| solution type | det(Fi) |
|---|---|
| max random signal | 9.51e+01 |
| unconstrained design | 1.83e+03 |
| unconstrained sequence | 1.37e+03 |
| constrained design | 1.17e+03 |
| constrained sequence | 1.17e+03 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems and Identification · Advanced Control Systems Optimization · Probabilistic and Robust Engineering Design
D-Optimal Input Design for Nonlinear FIR-type Systems:
A Dispersion-based Approach
Alexander De Cock [email protected]
Michel Gevers [email protected]
Johan Schoukens [email protected] ELEC, Vrije Universiteit Brussel, 1050 Brussel, Belgium
ICTEAM, Louvain University, B1348 Louvain la Neuve, Belgium
Abstract
Optimal input design is an important step of the identification process in order to reduce the model variance. In this work a D-optimal input design method for finite-impulse-response-type nonlinear systems is presented. The optimization of the determinant of the Fisher information matrix is expressed as a convex optimization problem. This problem is then solved using a dispersion-based optimization scheme, which is easy to implement and converges monotonically to the optimal solution. Without constraints, the optimal design cannot be realized as a time sequence. By imposing that the design should lie in the subspace described by a symmetric and non-overlapping set, a realizable design is found. A graph-based method is used in order to find a time sequence that realizes this optimal constrained design. These methods are illustrated on a numerical example of which the results are thoroughly discussed. Additionally the computational speed of the algorithm is compared with the general convex optimizer cvx.
keywords:
System identification, Input design, Nonlinear systems, Convex optimization,
, ,
1 Introduction
The quality of identified models depends to a large extent on the experimental conditions under which the measurement data for these models are obtained. Therefore, experiment design is an important step in the identification process. This was recognized in the early days of the development of system identification theory, where significant attention was paid to the design of optimal experiments for the parametric identification of linear time-invariant systems. The focus was on the design of optimal inputs that maximize some scalar function of the Fisher information matrix under a constraint on the power of the input signal [30, 12]. The motivation is that if the parameter estimator is asymptotically efficient, then its covariance matrix converges to the inverse of the Fisher information matrix.
For linear stationary systems operating in open loop, the Fisher Information matrix is an affine function of the input spectrum. Therefore, the optimization is performed by first computing the optimal input spectrum, and then constructing an input signal for the experiment that realizes this optimal input spectrum [30, 12].
The development of identification for control, and more generally of application-oriented identification [17], together with the advent of powerful semi-definite programming tools, gave rise to novel optimal experiment design techniques for linear time-invariant systems. Experiment design methods were developed for closed-loop identification with a fixed controller [19] as well as for closed-loop identification with a “to be designed controller” [16], and for an ever wider range of possible design criteria and constraints. In addition, the dual problem of least costly identification was addressed, where the design aims at minimizing the cost of the identification experiment (say, in terms of the input signal energy used) subject to a constraint on the achieved model quality [1]. A survey of these results can be found in [10].
For linear time-invariant systems, the solution of these optimal experiment design problems consists of first expressing the criterion and the constraints in terms of a finite parametrization of the input spectrum (in the case of open loop design) or of the joint input-output spectrum (in the case of closed loop design), and reducing the problem into a convex optimization problem under Linear Matrix Inequality (LMI) constraints over these parameters. This yields an optimal spectrum. The second step then consists of constructing a stationary signal that has this desired spectrum. A few results have also been developed where the optimization is performed directly with respect to the input samples of the signal in time-domain; see e.g. [3].
Optimal experiment design for nonlinear systems is considerably more difficult, because the criterion is typically non-convex, making global optimization more difficult, if not impossible. The main difficulty is that the Fisher information matrix of the experiment is not only dependent on the second order moments of the input but also on higher order moments [18]. Thus, optimal input design (OID) results for nonlinear systems have so far been obtained only for simple design criteria in the form of scalar functions of the information matrix, and by restricting both the class of systems and the class of input signals.
One subclass of nonlinear systems for which a number of OID results have been obtained is the class of nonlinear finite-impulse-response-type (FIR-type) systems. Constructing an OID for an example system out of this class was performed for Gaussian signals in [11] and for deterministic signals taking only a finite number of possible values in [4]. The restriction to such multilevel excitation signals is actually common to almost all recent results on OID for classes of nonlinear systems. In [22] a solution is proposed for nonlinear FIR systems using a probabilistic parametrization of the multilevel input signals, while in [7] these results are extended to systems with fading memory using deterministic multilevel input signals which are then characterized by the relative frequency of each possible subsequence. A common feature of these results is the adoption of a linear parametrization of the probability distributions, respectively the relative frequencies, of the input signals, leading to a convex formulation of the optimization criterion.
The difficult step in all OID methods for classes of nonlinear systems proposed so far is to go from the optimal distributions (or optimal relative frequencies) to the generation of a realizable input sequence. A solution based on graph-theoretical properties has recently been proposed in [28]: realizable input sequences are obtained by first computing the elementary cycles of the associated graph, which define a convex polyhedron. As we shall see, a significant part of the present paper is devoted to this problem of signal generation; it is also based on properties of the associated graph. By imposing additional symmetry constraints on the admissible subsequences, we propose a suboptimal solution that is computationally cheaper than the solution based on elementary cycles described in [28]. Additionally, we show that the optimization time for our solution compares favorably with that of a general purpose convex optimizer cvx.
Methods considering a wider class of nonlinear systems also exist. For example in [13] a particle filter approach for the general class of nonlinear systems is presented, while [29] presents an OID for a block structured nonlinear model consisting of linear dynamic and nonlinear static blocks. However, unlike the design methods for nonlinear FIR-type systems, these methods do not result in a convex optimization problem and can therefore not guarantee convergence to a global optimum.
To conclude this general introduction, let us mention that OID for nonlinear systems is particular interesting in fields where the cost of single experiment is very high. Examples of practically applied OID for nonlinear systems can be found in the field of (bio)chemistry [27], cellular biology [5] and medicine [9]. An extensive overview of the current state-of-the-art can be found in [8]. Contribution and relation with other work
This work is restricted to nonlinear FIR-type systems of memory length , whose inputs are deterministic input sequences of length , of which the sample values can only take a finite set of possible values: . Therefore each output sample of the system depends on a subsequence of the input sequence, and the total number of different subsequences that can be presented to the system is limited to a finite set of possible sequences. The Fisher information produced by a given input sequence is therefore completely determined by the subsequences it contains. For each input sequence of length we can define a corresponding frequency vector that indicates how many times each subsequence is present in the sequence. It will then be shown that the Fisher information matrix can be written as a convex combination of elementary Fisher information matrices, where the coefficients are the relative frequencies that indicate how many times each subsequence appears in the input.
Different scalar functions can now be used as measure of information. As long as it is a matrix nondecreasing function [2] the problem remains convex and the global optimum can be computed. In this paper we consider D-optimal designs, meaning that the determinant of the Fisher Information matrix is maximized. In addition we have opted for a dispersion-based method, which was already successfully applied in the linear case [21]. Advantages of this method are its intuitive interpretation, straightforward implementation and monotonic convergence to the global minimum. In addition we will illustrate with a short numerical example that the dispersion algorithm can compete with general purpose convex optimizers like cvx [24] in terms of computation time.
The minimization of the maximum of the dispersion function yields an optimal relative frequency vector. As already explained, this optimal frequency vector may not correspond to a realizable time sequence. To alleviate this problem, constraints need to be incorporated into the optimization problem. The space of frequency vectors which satisfies these constraints forms a polyhedron [28]. This allows one to represent every frequency vector in the search space as a convex combination of a set of corner points. Unfortunately, computing this set of corner points is numerically expensive. Therefore, an alternative way that approximates this set is proposed in this paper. It is based on restricting the search space to a constrained set of non-overlapping symmetric frequency vectors.
Once a realizable and optimal frequency vector is found, a time sequence satisfying this design needs to be derived. To this end a graph-based method was suggested in [22] and later elaborated in [28] for stochastic input sequences. In this work a similar graph-based method is presented for deterministic input sequences. The problems addressed in this paper have close connections with those addressed in [22, 28] and [7]. The main difference with [22, 28] lies in the fact that in these papers stochastic inputs are considered and that the problem is parametrized with respect to the probability density of the subsequences, instead of their relative frequency. While this makes little difference when it comes to the numerical computation of the designs, the interpretation of the results is quite different. Moreover, in [28] no relation was made between the memory of the system and the length of the subsequences used in the input generation method. We study this relation in Section 8, where we show that the designed input is only optimal if both memories are equal. If the memory of the generation method is shorter than the memory of the system, the search space becomes too restrictive. If the memory of the generation is chosen longer, the same results are obtained but at a higher computational cost.
In [7] an optimal deterministic input is computed for the class of fading memory nonlinear systems. However, unlike this paper, no sequence generation method was presented in [7].
In addition to the use of a simple dispersion-based criterion, the use of an alternative search space spanned by a non-overlapping symmetric set that considerably reduces the computation time, another novel contribution of this paper is the discussion of the interaction between the system memory and the subsequence length. Overview
The paper is structured as follows. Section 2 formalizes the optimal input design problem for the considered class of systems. Section 3 shows how the associated optimization problem can be solved based on the dispersion function. In Section 4 the problem of signal generation is considered, and a graph-based method is proposed. Section 5 discusses how the constraints needed for signal generation can be incorporated into the optimization. Section 6 illustrates our method on a numerical example. In Section 7 the computation time of the dispersion-based method is compared with cvx. In Section 8 we motivate why the subsequence length should be chosen equal to the memory length of the system. Section 9 summarizes the obtained results.
2 Problem Statement
The goal of this paper is to find the most informative input of given length for a nonlinear FIR-type system (as defined in Assumption 1 below) with a known model structure, and disturbed with independent Gaussian output noise. As a measure of information the determinant of the Fisher information matrix is considered. Therefore the design is called D-optimal. When the parameters are identified with such an input sequence, and the estimator is assumed efficient, the volume of the uncertainty ellipse in the parameter space is minimal [30]. The following assumptions describe this problem formally.
Assumption 1**.**
The considered system is a member of the class of nonlinear FIR-type systems with memory length and which are differentiable with respect to the parameters of the system. This model class was first studied in [22] in the context of optimal input design.
[TABLE]
*where is the noiseless input, is the noiseless output, are the parameters of the model. Notice that the output at time only depends on the current input sample and previous input samples.
Additionally it is assumed that the system is identifiable with respect to the parameters, meaning that there exists an input sequence such that the outputs and of the corresponding models (1) are identical only if .*
Assumption 2**.**
The class of inputs will be restricted to deterministic time sequences with a length of samples, whose amplitude can only take values from a finite, predefined set of values:
[TABLE]
Assumption 3**.**
The output of the true system is obtained as the sum of a noise-free output defined by (1) with a “true” parameter vector and some additive independent identically distributed (i.i.d) Gaussian noise . The noise is also independent of the input signal .
[TABLE]
Criterion The D-optimality criterion is used, which means that the optimal input sequence of length corresponds to the sequence for which the determinant of the information matrix is maximal:
[TABLE]
Remark 1**.**
Since the noise variance only scales the Fisher information matrix, it will not alter the optimal input design.
Given the noise assumption, can be computed based on the time domain data as (see [25]):
[TABLE]
where is a matrix containing the partial derivatives of , and stands for the transpose of .
Remark 2**.**
For the computation of the optimal input, it will be assumed that the true parameters of the system, , are known. While this may seem in contradiction with the final goal of system identification it is a standard assumption in the field of optimal input design [30].
3 Problem Solution
In order to solve the D-optimal input design problem, as presented in the previous section, three important steps will be made. First, the concept of the -length subsequences is formally introduced. Second, it will be shown that the Fisher information matrix can be expressed as a weighted sum of the Fisher matrices associated to each subsequence. This property is the key to solve the optimization problem efficiently. Third, it will be shown how the problem can be solved with a dispersion-based method similar to the one used in the linear case, as described in [12, 21].
3.1 Subsequences
Considering the system model, it is clear that each output sample of the system depends only on a subsequence of the input sequence.
Definition 1**.**
A subsequence is an ordered set of values each, of which is drawn out of the predefined set . In total different subsequences can be defined. The space that contains all possible subsequences will be called .
Notation 1**.**
Each index set with can be made to correspond to a unique integer index defined as follows:
[TABLE]
Notice that is the representation of in base , that ranges from to , and that (4) defines a one-to-one mapping between and the index set . Thus, the mapping (4) establishes the equivalence:
[TABLE]
In the remainder of the paper we use indistinctly the notation or for a function of the index set, where and are related by (4).
Notation 2**.**
In order to be able to refer to a specific subsequence from , the following notation is introduced:
[TABLE]
Applying the mapping introduced in Notation 1, leads to the following shorthand notation .
3.2 Fisher information and frequency vector
From (3) it is clear that the element of the Fisher information matrix can be written as a sum over the time samples:
[TABLE]
where the function corresponds to the product of the partial derivatives of with respect to the and parameter, evaluated at time instant for a given input sequence .
Because the model class was restricted to FIR-type systems the functions depend at most on successive input values.
[TABLE]
In other words, the function depends on the subsequence that ended at time t.
Remark 3**.**
Notice that the first terms depend upon samples which are not measured. Their value is set by the initial conditions. It will be assumed that the signal is periodic with period . This allows us to determine the values of these unknown samples.
Since the number of possible subsequences is fixed, the number of different terms in (6) is also fixed. If we compute the values of for each possible subsequence we can reorder the sum over time such that we obtain a weighted sum over all possible subsequences:
[TABLE]
The weight indicates how often the subsequence occurs in the sequence and is therefore called the frequency of the sequence . From (8) it is clear that the Fisher information matrix for a given input sequence is completely determined by the subsequences that contains.
Definition 2**.**
The Fisher information matrix corresponding to the subsequence will be called the elementary Fisher matrix:
[TABLE]
where the partial derivatives are evaluated for the subsequence .
Definition 3**.**
The vector , containing the number of times each subsequence occurs in the input design, is called the frequency vector of the design.
Remark 4**.**
Given a time sequence , the corresponding frequency vector can be obtained by counting the number of times each subsequence occurs in the signal . The subsequences are counted as depicted in Fig.1. Notice that a signal with N samples contains N subsequences, due to the assumed periodicity of the signal (see Remark 3).
Normalizing (8) by the number of subsequences allows us to rewrite the normalized Fisher information matrix as a convex combination of the elementary matrices with the relative frequencies as convex coefficients:
[TABLE]
Notice that only the coefficients depend upon the particular design used in the input . The elementary information matrices are independent of the design and can be computed a priori given the amplitude set and the memory of the FIR type nonlinear system.
Definition 4**.**
The frequency vector divided by the total number of subsequences in the design is called the relative frequency vector . By construction the normalized frequencies have the properties of convex coefficients, meaning that their values range from 0 to 1 and that their sum is exactly one:
[TABLE]
Remark 5**.**
By performing the optimization for the relative frequency vector , we relax the problem by making the search space continuous. However, the frequency vector can only contain natural numbers. So, after denormalizing the frequency vector, the values will be rounded to the nearest natural number. This may cause a slight decrease in information.
Computing an optimal experiment consists of finding the vector that maximizes the determinant of the normalized information matrix:
[TABLE]
3.3 Dispersion function
In the previous subsection it was shown that the normalized Fisher information matrix can be rewritten as a convex combination of known elementary information matrices. We now show that this allows us to use a dispersion-based method in order to perform the optimization. Instead of solving the optimization problem directly, an equivalent problem is solved where the maximum of an auxiliary function, called the dispersion function, is minimized. The dispersion function, also called response dispersion, was introduced in the experiment design for identification arena in [12].
Definition 5**.**
With the notations introduced above, the dispersion function is defined as:
[TABLE]
where is the information matrix computed for the given design , and is the information matrix corresponding to the subsequence.
Some useful properties of the dispersion function are [12]:
- •
The maximal value of the dispersion can never be smaller than the number of independent parameters in the model .
- •
For any design , the inner product equals the number of free parameters in the model.
- •
The dispersion function can also be interpreted as a normalized variance of the estimated model.
Theorem 1**.**
The following characterizations of an optimal design are equivalent:
* maximizes * 2. 2.
* minimizes * 3. 3.
**
*where is the number of independent parameters in the model.
Proof: see [12] Chapter 6, page 147.*
Theorem 1 states that the design that maximizes the determinant of the Fisher information matrix is the same design that minimizes the maximum of the dispersion function. Since a simple and efficient algorithm exists that solves the latter problem, we shall adopt it for the computation of our optimal experiment.
3.4 Optimization Algorithm
In [21] a simple and stable, monotonically converging algorithm is presented, which finds the design that minimizes the maximum of the dispersion function. This algorithm can be summarized in four steps:
Initialize with a uniform design: 2. 2.
Compute the dispersion function for the current design using (12) 3. 3.
Update the design in accordance with the dispersion function as follows 4. 4.
Stopping criterion: if is smaller than a predefined threshold, the optimal solution is assumed to be found; else go to step 2.
The stopping criterion is based on the third expression of Theorem 1. The monotonic convergence of the algorithm is proven in [31]. In Section 7 the performance of this dispersion-based algorithm will be compared to the general purpose convex optimizer cvx.
Remark 6**.**
Notice that once a particular frequency becomes zero for some , this frequency remains zero for all subsequent iterations. Therefore, the computational speed of the algorithm can be improved by only updating the nonzero frequencies. This avoids unneeded evaluations of the dispersion function.
4 Signal Generation
The optimal frequency vector can be interpreted as an experiment of measurements, where each measurement consists of applying a single subsequence to the system and measuring the corresponding output sample.
A naive way to perform the optimal design is to concatenate all the subsequences contained in (i.e. all the subsequences for which ), and only measure the output samples which correspond to these subsequences. This means that a signal with samples is used at the input, in order to collect samples at the output. Clearly this is not an efficient approach, since only out of the output samples are used for parameter estimation.
It would be better to generate a periodic input sequence with a period length of samples, containing the needed subsequences. However, not every frequency vector has a corresponding input sequence of length because the last inputs of subsequence for which may not correspond to the first inputs of another subsequence for which . In order to derive conditions on the frequency vector which guarantee the existence of at least one realizable time sequence, a sequence generation method will be introduced. This generation method will correspond with a path through the associated graph of the frequency vector.
Definition 6**.**
Given a subsequence of length n, the right subsubsequence is defined as the subsequence of length n-1 obtained by removing the first element of the original subsequence. Similarly, the left subsubsequence is defined as the subsequence of length n-1 obtained by removing the last element of the original subsequence.
Definition 7**.**
The graph associated with a frequency vector (see Notation 1) is defined as follows:
- •
The graph contains nodes, each containing a different subsubsequence of length n-1
- •
Each subsequence of length n corresponds to a directed edge connecting its left and right subsubsequence. The edge starts in the left subsubsequence and ends in the right one.
- •
Each edge has a multiplicity which corresponds to the subsequence frequency of its corresponding subsequence.
Example 1**.**
Consider a FIR-type system with and two amplitude levels . In total different subsubsequences of length n-1 can be defined. So the associated graph contains four nodes. Additionally different subsequences of length n can be defined. This means that the graph contains eight edges. For example, the edge corresponding to the subsequence connects its left subsubsequence with its right subsubsequence . The multiplicity of the edge connecting with equals its frequency . If this reasoning is repeated for every subsequence, the graph in Fig.2 is obtained.
If there exists a time sequence corresponding to the frequency vector , then this sequence can be obtained by the following steps:
Construct the graph associated with the frequency vector 2. 2.
Find a path, starting from an arbitrary node, that uses each edge exactly as many times as its multiplicity indicates. 3. 3.
For every edge in the path, add the last amplitude value of the corresponding subsequence to the end of the input sequence.
From the above, it can be concluded that a time sequence exists if there exists a path, starting from an arbitrary node, that uses each edge exactly as many times as its multiplicity indicates. In graph theory such a path is called an Euler cycle. Some well known algorithms for finding the Euler cycle (if it exists) are the algorithm of Fleury [6] and the algorithm of Hierholzer. More recent versions of these algorithms can be found in any textbook on graph theory [15].
In order for an Euler cycle to exist, the associated graph can not be disjoint and all vertices need to have a zero degree, which means the sum of multiplicities for all outgoing edges minus the sum of multiplicities for all incoming edges needs to be zero.
Theorem 2**.**
A periodic time sequence exists that realizes a prescribed frequency vector only if its associated graph is not disjoint and the frequency vector satisfies the following conditions:
[TABLE]
or equivalently, using the scalar index rather than the vector index , as defined in Notation 1:
[TABLE]
Proof:* A time sequence has the correct frequency vector if a path can be constructed from the associated graph whereby each edge is used exactly as many times as its multiplicity indicates. In order for such a path to exist, the sum of the multiplicities of the outgoing and incoming edges need to be equal in every node, where each node is defined by a vector index . This is precisely the constraint (2). Now fix a subsequence , and let denote the scalar index for this subsequence, i.e.*
[TABLE]
We now express the two indices of length n appearing in (2) as a function of :
[TABLE]
With exactly the same procedure one gets
[TABLE]
Remark 7**.**
In order to illustrate that the equations in (2) are not sufficient conditions for the existence of a realizable time sequence for a given a frequency vector, consider a disconnected graph which satisfies the constraints. In such a graph there is no single path connecting all the nodes, meaning there is also no corresponding time sequence.
Remark 8**.**
In a stochastic framework, the frequency matrix can be considered as mutual discrete probability distribution functions of the n stochastic variables. Imposing that the signal is stationary, will lead to the same constraint as given in (2) [22]. The graph described above can then be seen as a Markov chain used to generate a realization of the frequency matrix.
5 Constrained Optimization
In the previous section it was shown that a frequency vector can only be realized as a time sequence if it meets the conditions (2). Therefore, these conditions will be imposed during the optimization. Unfortunately the dispersion-based algorithm cannot handle constraints directly. Instead, the realizable search space will be represented as a convex set and the optimization will be performed with respect to the new convex coefficients.
We now show that the set of relative frequency vectors meeting the condition (2) are a convex set. First, it should be noted that the relative frequencies are convex coefficients (see (4)). Therefore the full search space is contained in a convex polyhedron [2]. Second, the constraints presented in (2) correspond to a set of linear equality constraints, meaning they describe a subspace centered at the origin. Combining these two observations shows that the space of realizable relative frequency vectors consists of the intersection between a polyhedron and a subspace, which in turn is again a polyhedron [2] and therefore a convex set.
Knowing that the search space is a convex polyhedron allows us to express every realizable relative frequency vector as a convex combination of the corner points:
[TABLE]
where are the mentioned corner points, is the number of corner points, are the new convex coefficients with respect to which the optimization will be performed, and is the frequency vector corresponding to the coefficient vector .
In [28] the same reasoning was followed. Additionally, it was shown how the corner points that span the polyhedron could be constructed from the elementary cycles present in the graph associated to the unity frequency vector. However finding these elementary cycles is a computationally heavy task and becomes infeasible for medium sized graphs [14].
5.1 Non-overlapping Symmetric Set
As an alternative, we present a more restrictive convex set that has the advantages that its corner points can easily be computed. However, it can not be guaranteed that this set contains the global optimum of the polyhedron of realizable frequency vectors.
Definition 8**.**
A set of vectors in is called non-overlapping symmetric if they have the following four properties:
*positivity constraint: *
** 2. 2.
*non-overlapping constraint: *
** 3. 3.
*unity sum constraint: *
** 4. 4.
*symmetry constraint: *
* *
**
where indicates the number of vectors in the set, stands for the set of all possible permutations of the symbols , and means ’there exists only one’.
Remark 9**.**
The first property guarantees that the frequencies are nonnegative. The second property guarantees that the vectors don’t have the same nonzero element positions, thereby making these vectors linearly independent. The third property ensures that the sum of the frequencies is one. The first three properties are needed in order to impose (4) on . The fourth property imposes symmetry on the frequency vector. This symmetry constraint implies that the constraint (2) is satisfied.
Remark 10**.**
The number of non-overlapping symmetric vectors equals the number of degrees of freedom in a symmetric tensor of order and dimensionality . For example if , , which correponds to the degrees of freedom in a symmetric matrix.
Example 2**.**
In the case of and two possible amplitude values (i.e. ), the number of vectors in the non-overlapping symmetric set is . The non-overlapping symmetric set is given by the three frequency vectors below:
[TABLE]
It is easy to check that the four constraints are satisfied. Note that the fourth constraint reduces here to for or, equivalently when the unique index of (4) is used in lieu of .
5.2 Symmetric Designs
Assuming the use of a non-overlapping symmetric set, the expression of the normalized information matrix can be rewritten as a function of the weighting coefficients by substituting (15) into (10):
[TABLE]
where are the new elementary information matrices.
During the constrained optimization, the dispersion function will be computed based on these . We define:
[TABLE]
where the subscript indicates that is computed from the elementary information matrices . Notice that the values and dimensions of the dispersion functions and are different. This is because the dispersion function evaluates the quality of the input design relative to the considered search space.
5.3 Optimization Algorithm Revisited
By introducing the new elementary information matrices and the dispersion function , and as a result of the convex properties of the coefficients , the dispersion-based algorithm described in Subsection 3.4 can be reused to find the for which the determinant is maximal. Revisiting the 4-step algorithm leads to:
Initialize with a uniform design: 2. 2.
Compute the dispersion function for the current design using (18) 3. 3.
Update the design in accordance with the dispersion function as follows 4. 4.
Stopping criterion: if is smaller than a predefined threshold, the optimal solution is assumed to be found; else go to step 2.
6 Numerical Example
The methods above will now be illustrated on the following numerical example which consists of a finite impulse response filter, followed by a polynomial nonlinearity:
[TABLE]
with and . The value of will be fixed in order to make the problem identifiable. The noise is zero mean Gaussian distributed with unity variance. The amplitude set contains 10 uniformly spaced values between -1 and 1 (A=10). Because the system has a memory length of two (n=2), 100 different subsequences should be considered.
Notice that this numerical problem leads to a complete graph with 10 nodes and 100 edges. For such graph it is proven that the number of elementary cycles is 1112073 and that computing all these cycles has a complexity of 122328140 [20]. An implementation of Johnson’s algorithm [20] in Matlab takes multiple days in order to compute all these elementary cycles. Additionally the optimization problem would need to consider 1112073 variables which is not feasible. In contrast the symmetric design space contains only 55 frequency vectors that can be compute in a couple of seconds.
6.1 Unconstrained optimization
First, the unconstrained optimization is performed. This means that the relative frequencies are optimized with the dispersion-based optimization method described in subsection 3.4. The results for 1000 iterations are plotted in Fig.3. In each iteration step the dispersion function, relative frequency vector and determinant of the normalized information matrix are computed and stored.
The top plots represent the evolution of the maximum dispersion and of the determinant of the Fisher information matrix as a function of the iterations. They show that the method successfully lowers the maximal value of the dispersion and at the same time increases the determinant of the information matrix. Note that in the end the maximal dispersion reaches the value of 3, which corresponds to the number of free parameters. This indicates that the obtained solution is optimal (see Theorem 1).
The optimal relative frequency vector at the end of the iterations is depicted in the bottom left plot; it only contains 6 entries that are different from zero. This means that the optimal design is such that only 6 out of the 100 possible subsequences are used to excite the system. Each of them has the same frequency value. The bottom right plot represents the value of the dispersion function for each of the 100 subsequences.
Table 1 shows the subsequences of the optimal unconstrained design and their corresponding relative frequencies, while Fig. 4 represents the associated graph. Two observations should be made. First, the design does not obey the constraints (2). As a result, the frequency vector cannot be realized as a time sequence. Second, the graph is disconnected, which means that not every node can be reached from any other node.
6.2 Evolution of the Determinant
Now let us have a closer look at the evolution of the determinant in Fig.3. For the first 10 to 20 iterations there is a rapid increase in the determinant value. During these iterations, the number of different subsequences is reduced drastically. After this rapid change, the evolution of the determinant value becomes stable. Only the frequencies of the remaining subsequences are changed but the selection of subsequences stays the same. From this observation it can be concluded that selecting the optimal subset of subsequences is more important for the quality of the design than finding the optimal frequencies of the selected subsequences.
6.3 Constrained optimization
Next, the optimization is performed using the non-overlapping symmetric basis vectors obeying the constraints of Definition 8. This means that the optimization is performed with respect to the coefficients of (17) using the elementary information matrices . Again 1000 iterations are taken which leads to the plots in Fig.5. Due to the symmetry constraint, only coefficients need to be considered; hence in the bottom right plot ranges from 1 to 55. The determinant increases monotonically while the maximal value of the dispersion is driven to its minimal value of 3. This indicates that the final design is optimal in its subspace of constrained designs.
Table 2 shows the relative frequencies of the optimal constrained design which contains eight different subsequences with different frequencies. As expected, the design is symmetric, meaning that the subsequences and have the same frequency. This allows us to concatenate the subsequences without the need of unwanted transition subsequences, leading to a realizable design that is optimal in the constrained space. The same conclusion can be made from the associated graph in Fig. 6.
6.4 Computing the Dispersion
At first glance it seems contradictory that the optimal unconstrained and constrained design have the same maximum dispersion but different determinant values. However, the dispersion of the constrained design and the dispersion of the unconstrained design can not be compared directly, because they are based on different elementary information matrices. In order to make a comparison possible, the dispersion of the constrained solution is computed with respect to the elementary Fisher matrices .
The results are plotted in Fig.7. From the left plot it is clear that the dispersion function is larger than and . This indicates that the constrained solution is not optimal in the full frequency space. This is in accordance with the observation that the determinant of the constrained design is smaller than the determinant of the unconstrained design . See Table 3 for the exact determinant values.
6.5 Signal Generation
Both designs will be translated into a time sequence containing 100 subsequences. During this process three approximations are considered.
- •
After denormalization, the values of are rounded to the nearest natural number.
- •
The unconstrained design needs additional transient subsequences because condition (2) is violated.
- •
The constrained design needs additional transient subsequences when the graph is disconnected.
All these approximations lead to a decrease in the determinant of and alter the total subsequence count, making the resulting time sequences suboptimal.
First, the time sequence for the constrained design is generated. The resulting input sequence can be found in the top plot of Fig 8. The bottom plot depicts the difference in frequency between the optimal relative frequency vector and the generated time sequence. All errors are smaller than , meaning only rounding errors are present.
The unconstrained design does not satisfy the constraints (2). Therefore, the design will be slightly altered in order to find a time sequence which realizes the unconstrained design as well as possible. The graph of the altered design can be found in Fig. 9. Notice that the graph is no longer disconnected and satisfies the conditions in (2).
After altering the design a time sequence can be generated. The results can be found in Fig.10. The generated sequence contains 104 samples due to roundoff errors. If a sequence of exactly 100 samples is needed some and subsequences could be removed. The positive frequency errors reflect the change in frequency compared to the original subsequences. The negative errors reflect the addition of the dotted arrows to the design (see Fig.9).
6.6 Comparing Designs
Table 3 summarizes the performance of all previous designs. In order to remove the influence of the signal length, the Fisher matrix is computed based on the relative frequencies. As a reference for comparison, the maximum determinant out of 1000 randomly generated signals is also added.
Both the constrained and unconstrained designs perform two orders of magnitude better than the best randomly generated design. Out of the optimized designs, the unconstrained design has the highest determinant value. When this design is altered and realized as a time sequence, a decrease in the determinant can be observed, but the resulting sequence still outperforms the constrained design. Notice that there is no difference in determinant value between the constrained design and its corresponding time sequence.
From these observations it can be concluded that it is meaningful to compute both the constrained and unconstrained design. If the unconstrained design can easily be altered to a realizable design, without too much loss in performance, it should be preferred. If not, the constrained design presents a valuable alternative, because it can always be realized.
7 Computation time
To evaluate the computational complexity of the dispersion-based algorithm, we compared it with the general purpose convex optimizer cvx (we choose the SeDuMi as internal solver [26, 24]). Both algorithms are used to compute the optimal input for a given set of problems. For each problem the computation of the optimal input is performed ten times and the median of the computation time is used as measure of computing speed.
7.1 Stopping Condition
In order to guarantee that the quality of the solution computed by both solvers is equal, the determinant of the Fisher information matrix of the cvx solution is used as an absolute stopping criterion for the dispersion-based algorithm. This means that the dispersion-based algorithm keeps iterating until a solution with the same or higher determinant value is found.
7.2 Problem Parameters
For all problems, the system consists of an FIR filter followed by a polynomial nonlinearity. The filter coefficients correspond to the natural numbers between 1 and . The nonlinearity consists of a cubic and linear term and has the same coefficients as in the numerical example. It is assumed that the cubic parameter is fixed during the estimation. The optimization is performed over the search space spanned by the non-overlapping symmetric set.
Two distinct sets of problems will be solved. For the first set of problems, we increase the number of amplitude levels between -1 and 1 while keeping the system memory constant to 2. For the second set of problems, the input set is fixed to but the length of the system memory is increased.
The results of these simulations are depicted in Fig. 11. Notice that only the time for the optimization is considered. The time needed to compute the non-overlapping symmetric set, which is the same for both solvers, is depicted in Fig. 12.
7.3 Effect of the Amplitude Levels
The computation times for the first set of problems are depicted in the left subplots. We see that the computation time increases with the number of amplitude levels, regardless of the solver. This should be expected, as more amplitude levels lead to a bigger search space. For less than 20 amplitude levels the dispersion-based algorithm computes faster than cvx. Past the 20 amplitude levels the performance of the solution of cvx cannot be perfectly matched by the solution provided by dispersion-based algorithm, which leads to higher computation times. When we relax the stopping condition by allowing the dispersion-based algorithm to stop when it reaches 99% of the determinant value found with cvx, we see that the computation times of the dispersion-based method are lower (see left, lower subplot in Fig. 11). However, the difference between the two methods becomes smaller with increasing number of amplitude levels.
7.4 Effect of the Memory Length
In the second set of problems, the input set is fixed but the length of the system memory is increased. The computation times for the second set of problems are depicted in the right subplots. We can see that the computation time increases with the length of the memory of the system. This is normal because the memory length is exponentially proportional with the dimension of the search space. More importantly, we see that the dispersion-based algorithm reaches the same performance as cvx in shorter computation times. The difference is around two orders of magnitude.
7.5 Computing the Non-Overlapping Symmetric Set
Untill now, we only considered the time needed to perform the optimization. However, before we can perform the optimization, we need to construct the non-overlapping set and compute the Fisher information matrix for each vector in this set. In Fig. 12 the computation time for this step is depicted for increasing problem sizes. In the top plot we see the evolution for a fixed system memory and an increasing number of amplitude levels. In the bottom plot we see the evolution for a fixed number of amplitude levels and an increasing memory length. Especially from the bottom subplot it becomes clear that computing the non-overlapping set is the most time expensive step of the OID. For a memory length of 10, computing the set takes already more than 6 minutes, while the optimization takes 2 to 3 seconds.
7.6 Summary of the Results
From the above results we can conclude that the dispersion-based algorithm has a similar, if not better, performance compared to general purpose convex optimization algorithms for the presented optimal input problem. However, it turns out that the highest computational cost is not in the optimization of the design, but in the computation of the set describing the search space. As long as this bottle neck is not removed the choice of optimization algorithm is not critical. For performance comparison for large scale random optimal problems we refer to [23], where it is shown that for problems with 1000 variables or more, interior point methods have the best performance.
8 System Memory vs Subsequence Length
Until now we have assumed that the length of the subsequences and the memory of the system are equal. In [28] the connection between length of the subsequences and memory length has not been examined. In this section we will illustrate why it is optimal to choose the length of the subsequences and the memory of the system equal.
The subsequence length determines the search space of input sequences in which we try to find the optimal sequence. Remember that this search space corresponds to a polyhedron of which the corner points can be found through the elementary cycles of the associated graph. The longer the subsequence length the more complex the graph and the more corner points the polyhedron has.
The memory length of the system determines how any sequence from search space of input sequences is mapped onto a frequency vector. The longer the memory length, the larger the frequency vector search space becomes. As a result less sequences will be mapped on the same frequency vector.
If we chose the subsequence length and memory length equal, the corner points are mapped on a set of convex independent frequency vectors. However, when the memory length is smaller than the subsequence length, this is no longer the case. Some of the frequency vectors can now be written as a convex combination of the others. In other words the effort made to compute the additional corner points is wasted since the mapping to the frequency vector space makes some corner points redundant.
When the memory length of the subsystem is larger than the subsequence length. All corner points will be uniquely mapped to frequency vectors. However, compared to the case of equal lengths, the search space is now smaller. This reduction may exclude more informative designs that are still realizable.
9 Conclusion
In this work a solution to the problem of D-optimal input design for nonlinear FIR-type systems with an input taking a finite set of possible values has been presented.
By expressing the optimization problem with respect to the relative frequency vector, instead of the time sequence, the problem became convex. This convex problem was solved with an unconstrained optimization scheme based on the dispersion function.
However, it turned out that additional constraints are needed in order to guarantee that the optimal design can be realized as a time sequence. By imposing that the solution should lie in the subspace described by a symmetric and non-overlapping basis, a realizable solution was obtained that is optimal in its subspace of constrained solutions and remains numerically tractable.
In order to find a time sequence that realizes this optimal constrained design, the associated graph was introduced. It was shown that a path, using all edges in the graph as many times as their multiplicity indicates, corresponds to a time sequence that realizes the design.
Comparing the realization of the constrained design with the realizations of a random and unconstrained design showed that the determinant was highest for the unconstrained design. However, it can not be guaranteed that this design is realizable without a significant loss in determinant value. Therefore, the constrained design is proposed as an attractive alternative, because it can always be realized.
The methods presented in this paper were applied on a simple numerical example. Additionally the computational cost of the method was compared with the general purpose convex optimizer cvx. From this comparison it turned out that the dispersion based method has similar or better performance for medium sized problems. {ack} This work was supported in part by the Fund for Scientific Research (FWO-Vlaanderen), by the Flemish Government (Methusalem), by the Belgian Government through the Inter university Poles of Attraction (IAP VII) Program, and the ERC Advanced Grant SNLSID.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] X. Bombois, G. Scorletti, M. Gevers, P.M.J. Van den Hof, and R. Hildebrand. Least costly identification experiment for control. Automatica , 42(10):1651–1662, October 2006.
- 2[2] S. Boyd and L. Vandenberghe. Convex Optimization . Cambridge University Press, New York, NY, USA, 2004.
- 3[3] B.L. Cooley and J.H. Lee. Control-relevant experiment design for multivariable systems described by expansions in orthonormal bases. Automatica , 37(2):273–281, February 2001.
- 4[4] A. De Cock, M. Gevers, and J. Schoukens. A preliminary study on optimal input design for nonlinear systems. In Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on , pages 4931–4936, Dec 2013.
- 5[5] V. Dinh, Rundell AE., and GT. Buzzard. Experimental design for dynamics identification of cellular processes. Bull Math Biol. , 76(3):597–626, 2014.
- 6[6] M. Fleury. Deux problèmes de géométrie de situation. Journal de mathématiques élémentaires , 2nd:257–261, 1883.
- 7[7] M. Forgione, X. Bombois, P.M.J. Van den Hof, and H. Hjalmarsson. Experiment design for parameter estimation in nonlinear systems based on multilevel excitation. In Control Conference (ECC), 2014 European , pages 25–30, June 2014.
- 8[8] G Franceschini and S Macchietto. Model-based design of experiments for parameter precision: State of the art. Chemical Engineering Science , 63:4846–4872, 2008.
