a method created by: Marco C. Campi and Erik Weyer

LSCR is a general method for system identification.

differently from standard identification methods, LSCR does not deliver a single model; instead, it delivers a model set.

as the amount of information increases, the model set shrinks around the true system and - for any finite sample size - the set contains the true system with a guaranteed probability chosen by the user.

The need for a set of models

Even when a system belongs to the model class, we cannot expect that a model identified from a finite number of data coincides with since there are a number of noise sources affecting our data: measurement noise, presence of disturbances acting on the system, etc.

As a consequence, under general circumstances the only probabilistic claim we are in a position to make is that

which is clearly a useless statement if our intention is to credit the model with reliability.

To obtain certified reliability results, one can move from nominal-model to model-set identification.

The situation is depicted in Figure 1: for any number of data points N, the parameter estimate is affected by random fluctuation, so that it has probability zero to exactly hit the true system parameter value. Considering a region around the estimate, however, elevates the probability thatbelongs to the region to a nonzero - and therefore meaningful - value.

Fig.1 Random fluctuation in parameter estimate.

This observation points to a simple, but important fact:

exploiting the information conveyed by a finite number of data points can at best generate a model set to which the system belongs. Any further attempt to provide sharper estimation results (e.g. one single model) goes beyond the informationavailable in the dataand generates results that cannot be guaranteed.

Challenging the reader witha preliminary example

Consider the system

Assume we know thatis an independent process with symmetric distribution around zero. Apart from this, no knowledge onis assumed: it can have any (unknown) distribution: Gaussian; uniform; etc. Its variance can be any (unknown) number, from very small to very large. We do not even make any stationarity assumption onand allow its distribution to vary with time.

The assumption thatis independent can be interpreted by saying that we know the system structure: it is an autoregressive system of order 1.

9 data points were generated according to the system and they are shown in Figure 2.

Can you guess the value ofor give a confidence region for it?

Fig. 2 Data for the preliminary example.

To form a guaranteed confidence region for, we use LSCR.

Rewrite the system as a model with generic parameter:

The predictor and prediction error associated with the model are

Next we compute the prediction errorsfor, and calculate

Note that,are functions ofthat can indeed be computed from the available data set. Then, we take different averages of these functions. Precisely, we form 8 averages of the form:

where the setsare subsets ofcontaining the elements highlighted by a bullet in the table below. For instance:,, etc. The functions,, can be interpreted as empirical 1-step correlations of the prediction error.

Functions,, obtained for the data in Figure 2 are displayed in Figure 3.

Fig.3 Thefunctions.

Can you now guess the value of?

Let me give you a hint: thefunctions have a tendency to intersect the-axis near. Why is it so? Let us re-write one of these functions, say, foras follows:

The right-hand-side is zero mean and, due to averaging, the"vertical displacement" from the zero lineis reduced. So, we would like to claim thatwill be "somewhere" near where the average functions intersect the-axis.

While the above reasoning makes sense, LSCR provides us with a much stronger claim:

RESULT: discard the rightmost and leftmost regions where at most one function is less than zero or greater than zero. The resulting interval [-0.04, 0.48] (see Figure 3) is a confidence region forwhose exact probability to containis0.5.

Comments.

1. The interval is stochastic because it depends on data; the true parameter valueis not and it has a fixed location that does not depend on any random element. Thus, what the above RESULT says is that the interval is random and contains the true parameter valuein 50% of the cases.

Fig.4 10 more trials.

To better understand the nature of the result, we performed 10 more simulation trials obtaining the results in Figure 4. Note thatandwere as follows:= 0.2,independent with uniform distribution between -1 and +1.

2. In this example, the probability is low (50%) and the interval is rather large. With more data, we obtain smaller intervals and higher probabilities.

3. The LSCR algorithm was applied with no knowledge about the noise level or distribution and, yet, it returned an interval whose probability was exact, not an upper bound. The key is that the above RESULT is a "pivotal" result as the probability remains the same no matter what the noise characteristics are.

4. LSCR works along a totally different inference principle from standard Prediction Error Minimization (PEM) methods. In particular - differently from the asymptotic theory of PEM - LSCR does not construct the confidence region by quantifying the variability in the estimate.

LSCR for general linear systems

Data generating system

Consider the general linear system in Figure 5.

Fig. 5 The system.

Assume thatandare independent processes (open-loop). For closed-loop click here.

No a-priori knowledge about the noise is assumed.

The basic assumption is that the system structure is known. Correspondingly, we take a full-order model class of the form:

Goal: construct an algorithm that works in the following way (see Figure 6): the algorithm takes a finite input-output data set and a probabilityas input and returns a confidence region that containswith probability.

Fig. 6 The algorithm.

Construction of confidence regions

Two types of confidence sets are constructed:, .

A confidence regionsforis usually obtained by taking the intersection of a few of theseand(see below).

------------------------------------------------------------------------------

Procedure for the construction of

------------------------------------------------------------------------------

(this generalizes the preliminary example)

(1) Compute the prediction errors

for a finite number of values of, say;

(2) Select an integer. For, compute

(3) Letand consider a collectionof subsets,, forming a group under the symmetric difference operation (i.e., if). Compute

(4) Select an integerin the intervaland find the regionsuch that at leastof thefunctions are bigger than zero and at least q are smaller than zero.

Theorem 1 below states that the probability thatis exactly equal to 1-2q/M. Thus, q is a free parameter the user can employ to determine the probability of the confidence region.

In the procedure for construction of, the empirical auto-correlations in point (2) are replaced by empirical cross-correlations between the input signal and the prediction error.

---------------------------------------------------------------

Procedure for the construction of

---------------------------------------------------------------

(1) Compute the prediction errors

for a finite number of values of, say;

(2) Select an integer. For, compute

(3) Letand consider a collectionof subsets,, forming a group under the symmetric difference operation. Compute

(4) Select an integerin the intervaland find the regionsuch that at leastof thefunctions are bigger than zero and at leastare smaller than zero.

Comments

1. The procedures return regions of guaranteed probability despite that no a-priori knowledge about the noise is assumed: the noise enters the procedures through data only. This could be phrased by saying that the procedures let the data speak, without a-priori assuming what they have to tell us.

2. The noise level does impact the final result as the shape and size of the region depend on noise via the data.

3. In the theorem, probabilities 1 - 2q/M are exact, not bounds. So, the theorem is free of any conservativeness.

Usually, a confidence setis obtained by intersecting a number of setsand, i.e.

An obvious question to ask is how one should chooseandin order to obtained well shaped confidence sets that are bounded and which concentrate around the true parameteras the number of data points increases. The answer depends on the model class under consideration and this issue is discussed in "LSCR properties".

From Theorem 1, it follows that:

Theorem 2 can be used in connection with robust design procedures: if a problem solution is robust with respect toin the sense that a certain property is achieved for any, then such a property is also guaranteed for the true system with the chosen probability.

Example 1 - ARMA system

Consider the ARMA system

whereandis an independent sequence of zero mean Gaussian random variables with variance 1. 1025 data points were generated.

The modelhas predictor and prediction error given by

In order to form a confidence region forwe calculated

and then computed

using the Gordon group. Next we discarded those values ofandfor which zero was among the 12 largest and smallest values ofand.

According to Theorem 2,belongs to the constructed region with probability at least.

The obtained confidence region is the blank area in Figure 7.

Using the algorithm for the construction of, we have obtained a bounded confidence set with a guaranteed probability based on a finite number of data points. As no asymptotic theory is involved, this is a rigorous finite sample result. For comparison, we have in Figure 7 also plotted the 95% confidence ellipsoid obtained using the asymptotic theory (e.g. L. Ljung, "System identification - Theory for the user, Chapter 9, 1999, Prentice Hall). The two confidence regions are of similar shape and size in this case.

Fig.7 Non-asymptotic confidence region for(blank region) and asymptotic confidence ellipsoid.= true parameter,= estimated parameter using a prediction error method.

Example 2 - A closed-loop system

This example was originally introduced in Garatti et al. (2004) to demonstrate that the asymptotic theory of PEM can at times deliver misleading results even with a large number of data points. It is re-examined here to show how LSCR works in this challenging situation.

Consider the system of Figure 8 where

Fig.8 The closed-loop system.

is white Gaussian noise with variance 1 and the referenceis also white Gaussian, with variance. Note that the variance of the reference signal is very small as compared to the noise variance, that is there is poor excitation. The present situation - though admittedly artificial - is a simplification of what often happens in practical applications of identification, where poor excitation is due to the closed-loop operation of the system.

2050 measurements ofandwere generated to be used for identification.

We first use PEM identification.

A full order model was identified. The amplitude Bode diagrams of the transfer function fromtoof the identified model and of the real system are plotted in Figure 9. From the plot, a big mismatch between the real plant and the identified model is apparent, a fact that does not come too much of a surprise considering that the reference signal is poorly exciting.

An analysis conducted in Garatti et al. (2004) shows that, when, the asymptotic PEM identification cost has two isolated global minimizers, one isand the second is a spurious parameter which we denote. Whenbut small as is the case in our actual experiment,does not minimize the asymptotic cost anymore, but random fluctuations in the identification cost due to the finiteness of the data points may as well result in that the estimate gets trapped near the spurious, generating a totally wrong identified model.

Fig.9 The identifiedtransfer function.

But, let us now see what we obtained as a 90% confidence region with the asymptotic theory. Figure 10 displays the confidence region in the frequency domain: surprisingly, it concentrates around the identified model, so that in a real identification application where the true transfer function is not known we would conclude that the estimated model is reliable, a totally misleading result.

Fig.10 90% confidence region for the identifiedtransfer function obtained with the asymptotic theory.

Return now to the LSCR approach.

LSCR was used in a totally "blind" manner, with no concern at all for the existence of local minima: the method is guaranteed by theory and it works in all possible situations.

The prediction error is given by

We used a Gordon groupwith 2048 elements, and computed

in the parameter space. We excluded the regions where 0 was among the 34 smallest or largest values of any of the three correlations above to obtain aconfidence set (see Theorem 2). The confidence set is shown in Figure 11.

The set consists of two separate regions, one around the true parameterand one around. This illustrates the global features of the approach: LSCR produces two separate regions as the overall confidence set because information in the data is intrinsically ineffective in telling us which one of the two regions contain the true parameter.

Fig.11 90% confidence set.

Figures 12 and 13 show the close-ups of the two regions. The ellipsoid in Figure 12 is the 90% confidence set obtained with the asymptotic PEM theory: when the PEM estimate gets trapped near, the confidence ellipsoid is centered around this spuriousbecause the PEM asymptotic theory is local in nature (it is based on a Taylor expansion) and is therefore unable to explore locations far from the identified model. This is the reason why in Figure 10 we obtained a frequency domain confidence region unable to capture the real model uncertainty. The reader is referred to Garatti et al. (2004) for more details.

Fig.12 Asymptotic confidence 90% ellipsoid, and the part of the non-asymptotic confidence set around.

Fig.13 Close-up of the non-asymptotic confidence region around.

LSCR properties

## Theorems 1 and 2 quantify the probability that belongs to the constructed regions. However, these theorems deal only with one side of the story. In fact, a good evaluation method must have two properties:

the provided region must have guaranteed probability (and this is what Theorems 1 and 2 deliver);

the region must be bounded, and concentrate aroundas the number of data points increases.

Securing this second property requires choosingandin a suitable way, and the correct choice depends on the model class. For a general discussion, particularly in connection with ARMA and ARMAX models, see Campi and Weyer (2005).

Presence of unmodeled dynamics

LSCR can also be used in the presence of unmodeled dynamics.

General ideas are discussed here by means of two simple examples:

identifying a full-ordertransfer function betweenand, without deriving a model for the noise;

presence of unmodeled dynamics in thetotransfer function.

Identification without a noise model: an example

Consider the system

Suppose that the structure of thetotransfer function is known. Instead, the noisedescribes all other sources of variation inapart fromand we do not want to make any assumption on howis generated. Correspondingly, we want our results regarding the value ofto be valid with no limitations whatsoever on the deterministic noise sequence.

Assume that we have access to the system for experimentation: we generate a finite number, say 7 for the sake of simplicity, of input data and - based on the collected outputs - we are asked to construct a confidence intervalforof guaranteed probability.

The problem is challenging: since the noise can be whatever, it seems that the observed data are unable to give us a hand in constructing a confidence region. In fact, for any givenand, a suitable choice of the noise sequence can lead to any observed output signal! Let us see how LSCR gets around this problem.

Before proceeding, we would like to clarify what is meant here by "guaranteed probability". We said thatis regarded as a deterministic sequence, and the result is required to hold true for any, that is uniformly in. The stochastic element is instead the input sequence: we will selectaccording to a random generation mechanism and we require thatwith a given probability, where the probability is with respect to the random choice of.

We first indicate the input design and then the procedure for construction of the confidence interval.

Input design

Let,, be independent and identically distributed with distribution

Procedure for construction of the confidence interval

Rewrite the system as a model with generic parameter:

Construct a predictor by dropping the noise term:

Next, we compute the prediction errorsfrom the observed data forand calculate

Then, we take different averages of these functions. Precisely, we form 8 averages of the form:

where the setsare subsets ofcontaining the elements highlighted by a bullet in the table below. For instance:,, etc.

The confidence interval is where at least two functions are below zero and at least two functions are above zero.

A simulation example was run withand where the noise sequence was as shown in Figure 14. This noise sequence was obtained as a realization of a biased independent Gaussian process with mean 0.5 and variance 0.1. The obtainedfunctions are given in Figure 15.

We located the interval where at least two functions were below zero and at least two were above zero, obtaining the interval in Figure 15. Generalizing the theory for the case of no unmodelled dynamics, one can establish that the obtained interval has exacxt probability 0.5 of containing the true(see Campi and Weyer (2006)).Fig.14 Noise sequence.

Fig.15 Thefunctions.

Unmodeled dynamics in the transfer function between

u and y: an example

Suppose that a system has structure

while - for estimation purposes - we use the reduced order model

The noise can be whatever, and we regardas a generic unknown deterministic signal.

After determining a region for the parameterone sensible question to ask is: does this region contain with a given probability the system parameterlinkingto?

Reinterpreting the above question we are asking whether the projection of the true transfer functiononto the 1-dimensional space spanned by constant transfer functions is contained in the estimated set with a certain probability.

Input design

Let,, be independent and identically distributed with distribution

Procedure for construction of the confidence interval

Construct a prediction by dropping the noise termwhose characteristics are unknown:

Next, we compute the prediction errorsfrom the observed data forand calculate:

Then, we take the average of some of these functions in many different ways. Precisely, we form 8 averages of the form:

where the setsare subsets ofcontaining the elements highlighted by a bullet in the table below. For instance:,, etc.

The confidence interval is where at least two functions are below zero and at least two functions are above zero.

A simulation example was run with,and where the noise was the realization of a biased Gaussian process shown in Figure 14. Ascan only take on the values -1, 1 and 0, it is possible that two or more of thefunctions will take on the same value on an interval. This tie is broken by introducing a random ordering. The obtainedfunctions and confidence region are shown in Figure 16.

Fig.16 Thefunctions.

Generalizing the theory, one can establish that the obtained interval has exacxt probability 0.5 to contain the true(see Campi and Weyer (2006)).

Nonlinear systems

Here we discuss an example. The reader is referred to Dalai et al. (2007, to appear in Automatica) for more details.

Consider the following nonlinear system

whereis an independent and symmetrically distributed sequence andis an unknown parameter.

This system can be made explicit with respect toas follows:

and - by substitutingwith a genericand re-naming the so-obtained right-hand-side as- we have

Second order statistics are in general not enough to identify the true parameter value for nonlinear systems. The good news is that LSCR can be extended to higher-order statistics with little effort. A general presentation of the results can be found in Dalai et al. (2007, to appear in Automatica). Here, it suffices to say that we can e.g. consider the third-order statisticand the theory goes through.

As an example, we generated 9 samples of,wherewere zero-mean Gaussian with variance 1. Then, we constructed

where the setsare subsets ofcontaining the elements highlighted by a bullet in the table below. For instance:,, etc.

These functions are displayed in Figure 17. The interval marked in blue is where at least two functions were below zero and at least two were above zero and has exact probability 0.5 of containing.

Fig.17 Thefunctions.

Closed-loopClosed-loop systems can be treated within to the open-loop framework by regardingandas external signals of the closed-loop system, see Figure 18. The LSCR theory can be applied unaltered in this setting.

Fig.18 Closed-loop recast as open-loop.

Gordon's construction of the incident matrix of a group

Given, the incident matrix for a groupof subsets ofis a matrix whoseelement isifand zero otherwise. InL. Gordon, "Completely separating groups in subsampling", Annals of Statistics, vol.2, pp.572-578, the following construction procedure for an incident matrixis proposed whereand the group haselements.

Let, and recursively compute ()

whereandare, respectively, a matrix and a vector of all ones, and 0 is a vector of all zeros. Then, let

Gordon (1974) also gives construction of groups when the number of data points is different from.

## M.C. Campi and E. Weyer.

Papers

Guaranteed non-asymptotic confidence regions in system identification.

Automatica, 41:1751-1764, 2005.

(the downloadable file is an extended version (with all proofs) of the Automatica paper)

M.C. Campi and E. Weyer.

Identification with finitely many data points: the LSCR approach.

Semi-plenary presentation. In Proc. Symposium on System Identification, SYSID 2006, Newcastle, Australia, 2006.

M. Dalai, E. Weyer and M.C. Campi.

Parameter Identification for Nonlinear Systems: Guaranteed Confidence regions through LSCR.

Automatica, 43:1418-1425, 2007.

Other related papers are downloadable from M.C. Campi's webpage