3. TOOLKIT DEMONSTRATION

Singular-Spectrum Analysis

Selecting the `SSA' button from the Analysis menu on the toolbar launches the following window (shows its state after pressing Default button, see below):

singular spectrum analysis

Having selected the data vector to be analyzed (here `soi', the SOI time-series) and the units of sampling interval, the principal SSA options to be specified are the Window Length, the type of Significance Test, and Covariance Estimator. The Default button is provided as a guide to select input parameters based on the length of the data time series.
The number of SSA Components specifies how many leading components in terms of captured variance will be retained for future analyses. After computation more information about these retained componentscan be obtained by using Advanced options (see below). Results of SSA are stored in matrix with a name specified in Spectrum field. In addition, T-EOFs and T-PCs (see below) are stored in matrices with names obtained by prefixing "eof_" and "pc_" to a Spectrum name, and can be accessed in Data I/O tool. If results from several SSA runs have been stored in different matrices, the parameters used in a particular SSA run will be restored in GUI by simply selecting correspondent matrix from a Spectrum pop-up list.


Window Length

As a rule of thumb, the window length M should be chosen to be longer than number of data points in the oscillatory periods under investigation, and shorter than number of data points in the spells of an intermittent oscillation. Vautard et al. (1992) recommend that the window length be less than about N/5 where N is the number of points in the time series. Robustness of results to M is an important test of their validity. The choice of window length sets the dimension of the lag autocorrelation matrix to be constructed and diagonalized by SSA, and thus determines the computational burden of the application. Larger values of M correspond to higher spectral resolution, although there is no direct equivalence between them.

We will set the Window Length to 60, which is a good choice for out time series (690 data points with a one month sampling rate) and the oscillatory periods (2 and 4 years) under investigation.


Covariance Estimation

The method for estimating the autocovariance matrix that is decomposed (diagonalized) by SSA is chosen by selecting either `Burg' , `VG' (Vautard &Ghil), or `BK ' (Broomhead &King) from the `Covariance' button on the main SSA control panel.

As indicated in the theoretical section, Burg estimation is an iterative process based on fitting an AR model with a number of AR components equal to the SSA window length. The Burg approach in principal should involve less "power leakage" due to the finite length of the time series and should therefore improve resolution (Penland et al., 1991). However, Vautard et al. (1992) found that the Burg estimate can induce significant biases when nonstationarities and very low-frequency variations are present in the series. Thus in some cases it will be worthwhile to try the Vautard et al. method. Also, for long series (N on the order of 5,000), the Vautard -Ghil estimate is less computationally burdensome and thus is completed more quickly.

In practice, we have found that the Vautard-Ghil method is more prone than is the Burg method to numerical instabilities (yielding negative eigenvalues) when pure oscillations are analyzed. Both the Burg and Vautard-Ghil methods impose a Toeplitz structure upon the autocovariance matrix whereas the Broomhead & King method does not. The Broomhead & King method is thus somewhat less prone to problems with nonstationary time series (Allen et al., 1992b), although the Vautard et al. method seems untroubled by all but the most extreme nonstationarities. The Toeplitz methods also impose symmetries on the EOF shapes whereas the Broomhead & King method does not. However the Toeplitz methods do appear to yield more stable results under the influence of minor perturbations of the time series (Allen and Smith,1996). 

NB: The Burg covariance matrix estimation option is not supported by Monte-Carlo signficance test which defaults to Vautard-Ghil if Burg is selected on the main SSA panel.

We used Burg covariance method for analyzing SOI time series.


Advanced Options

The Advanced Options panel is used to set the preferences for the particular significance test, as well as a table containing information about retained SSA components (the number specified in Components field, see above): captured variance, and rank or dominant frequency, based on which Signficance test has been used. The table below shows Advanced panel after performing SSA analysis on SOI index and is discussed in more details below.

SSA

By selectinfg rows of the table, user can plot correspondent T-EOFs or T-PCs, and perform reconstruction (discussed later). In addition, time series prediction can be done with settings defined in Prediction Options. See SSA Prediction chapter for more information.


Significance tests

There are three choices for Significance test

described below. If Heuristic option is selected, the eigenspectrum is displayed in order of eigenvalue rank. In the remaining two cases, the spectrum of eigenvalues is organized against the dominant frequency associated with the respective SSA eigenvector (T-EOF).  

Heuristic

It constructs a set of ad-hoc error bars that are based on the estimated decorrelation time of the time series according to:

dli = (2kL/N)1/2 li

where L is a typical decorrelation time for the time series, k is a user supplied Decorrelation weight between 1 and 2 (1.5 by default), and li is the i-th eigenvalue in the spectrum. The Toolkit estimates L to be the inverse of the logarithm of the lag-one autocorrelation of the time series. Weights close to 1 are closer--but not generally equal to--the liberal error bars of Vautard and Ghil (1989); weights approaching 2 are closer to the conservative error bars of Ghil and Mo (1991a).

After we click the Compute and Plot buttons a graphics window will open to display the eigenvalue spectrum of the specified SSA. The units of abscissa are SSA component number (eigenvalue rank), and with the variance contributed by each SSA component on the ordinate.

singular spectral analysis

SSA eigenvalue spectrum for 'soi'.

Since the window length was set to 60, SSA decomposes the time series into 60 components and thus 60 eigenvalues are plotted. The significance of the various components can be judged qualitatively by noting which components contribute significantly more variance relative to the noise background. The latter in turn assumed to include the components that lie in the flattish tail of the eigenvalue spectrum, i.e. components from about 10 to 60. The leading 10 components in a previous figure lie above a distinct break in the eigenvalue spectrum, and thus may be of physical significance. In particular, we are interested in the leading four components which form two pairs of nearly equal eigenvalues. As discussed in the theory section above, pair of nearly equal eigenvalues in SSA is one of the characterising as oscillation. However, the eigenvalues are subject to numerical and sampling errors (North et al., 1982), and mere pairing of eigenvalues is not enough to guarantee that an oscillation has been identified. In the eigenvalue plot above, the error bars show an ad hoc range of the estimation errors. (See below for more sophisticated significance tests.) Any of the eigenvalues with significantly overlapping error bars could represent an "oscillatory pair'. Also eigenvalues with error bars that overlap significantly with the error bars of the noise part of the spectrum must also be suspected of being part of that noise.

The Toolkit provides three pairing criteria to identify oscillatory pairs and trend components in clusters of eigenvalues whose ad-hoc error bars overlap; however, it is not necessary for the 'Heuristic' test to be selected on the main SSA panel. The results of these tests are written to the log file, that can be accessed in Log tab. These tests can be activated concurrently using checkboxes in Advanced panel

In the SOI example, the log file shows that pair 1-2 of SSA components passes both `Same Frequency' and `Strong FFT' tests for being oscillation, while the 5th component is a trend (see here for more on SSA detrending):

We will return to investigate the oscillations after considering some more of the controls on the main SSA panel.

Monte Carlo

The Monte Carlo SSA (MCSSA) significance test is described by Allen and Smith (1996). It tests significance against a red noise null-hypothesis. The test dataset of Allen and Smith (1996) is used in "Small signal" example of this guide. A large ensemble of red-noise surrogate time series are generated, each with the same length and same expected lag-1 autocorrelation as the time series to be tested (but see note under 'Include EOFs' below).

Two fixed T-EOF bases are used using Basis popup menu in Advanced Options:

`Include EOFs': List of "signal" EOFs (by rank) to include in the noise null-hypothesis, from which the noise surrogates and null-hypothesis EOFs are computed. Note that irrespective of basis (data or null-hyp. EOFs), the characteristics of the noise, and thus the error bars are modified by including data EOFs. In fact the contribution of the latter to the lag-1 autocorrelation of the data time series are removed, when they are included in the null-hypothesis, see Allen and Smith (1996). If no data EOFs are specified, the null-hypothesis is pure AR(1) noise. Set of space-delimited integers.

`Confidence': in percentiles, e.g. 95 will mean 95% confidence level with respect to red noise null-hypothesis.

`No. Surrogates:' Number of Monte Carlo noise realizations.

Chi-squared

This is the default test.. It follows the Monte Carlo SSA (MCSSA) approach (see above) in which red-noise surrogates are projected onto the basis consisting of the SSA eigenvectors (T-EOFs) of either the data or null-hypothesis covariance matrix. In contrast to MCSSA, the chi-squared test approximates the distribution of surrogate projections as chi-squared with 3M/N degrees of freedom. This is quite accurate for the case of sinusoidal EOFs and a linear (AR-type) noise null-hypothesis -- see Allen and Smith (1996), appendix. The probability of n excursions above the mth percentile is then estimated using the binomial distribution. This method is computationally fast compared to MCSSA which generates a large ensemble of surrogate noise series.

Using SSA for identifiying oscillatory components

A typical testing procedure for Monte Carlo or Chi-Squared tests would be as follows (i) begin by testing against pure noise and identify frequencies at which the data displays anomalously high power against this null-hypothesis; (ii) find the data EOFs which correspond to these frequencies (being data-adaptive, these will often provide a better filter for the signal than the corresponding EOFs of the noise); finally (iii) re-test including these EOFs in the null-hypothesis to check if there are other features in the spectrum which were concealed by the dominant signal . It is, of course, consistent to iterate this procedure, but beware of the problem of test multiplicity discussed in Allen and Smith (1996) section 4.2: iteration makes the test increasingly liberal, so results which are still on the margins of acceptable significance after a couple of iterations including successively more EOFs into the "signal" should be ignored. 

Accordingly, to test against a pure red-noise null-hypothesis, we choose Chi-Squared as a significance test (leaving the Include EOFs field blank). After clicking the Compute and Plot buttons, and adjsuting the X-axis limits in Graph Controls window, the following graphics window will open:

significance test

Chi^2 test for SOI

Here the variance is plotted against the dominant frequencies of 'data' EOFs on x-axis, and describe projection of data and null-hypothesis (NH) surrogates onto 'data' EOFs basis (see above). The low-frequency part of the spectrum has been zoomed in by using Graph Controls. The dominant frequencies of EOFs are calculated with a 0.001/dt resolution in the Nyquist interval from 0 to 0.5/dt. The error bars represent 95% of variance (with settings of Confidence levels such as in Advanced SSA Options) we expect to find in the state-space direction defined by a particular EOF from the 'basis', when analyzing a segment of 'NH' noise. Using 'Include EOFS' option we can construct a composite 'NH' as described above, but it is a pure red-noise here. For this graph the 'Basis' consists of the EOFs of the data covariance matrix, and the projection of data gives us SSA eigenvalues plotted as dots.

For the Data Eofs basis we observe that eigenvalues corresponding to EOFS 1-2 and 3-4 appear almost superimposed on each other at frequencies equal to ~ 0.02 and 0.034 cycle/month, and lie outside the null-hypothesis error bars. Thus they are relatively unlikely ( at the 95% level set in Confidence levels of Advanced Options) to be merely due to selected null-hypothesis process, and represent two significant oscillations. The results of the Chi-squared test should always be checked using the Monte Carlo approach, which is also essential with more complex noise models.  We leave it as an exercise to the reader to check that similar results are obtained for our SOI time series with the Monte Carlo SSA test.

One can also choose 'basis' EOFs to be computed from the expected covariance matrix of the 'Null-hypothesis' by selecting AR(1) from the basis popup menu in Advanced SSA options. Projection of data does not give SSA eigenvalues here, but the interpretation of the graph is the same. Note how the EOFs of the pure noise NH are almost regularly spaced. According to Allen and Smith ('96) the NH basis avoids artificial variance-compression inherited in SSA and therefore it has a lower probability of false-positive results, i.e. identifying the noise components as significant. After doing calculations again the resulting spectrum is shown below:

significance test

The NH basis also confirms the significance of the identified significant pairs, and we conclude that they correspond to the QB and QQ components of El Niño.


Reconstruction and Prediction.

In a demo version SSA Reconstruction/Prediction is available only for data of example projects! In a licensed copy this feature is enabled after activation with a purchased Serial No.

The SSA components table on the Advanced panel allows the reconstruction and prediction of the original time series from selected SSA components, as well as plotting associated T-EOFs and T-PCs by using Plot EOFs and Plot PCs buttons. The name of vector with reconstruction/prediction is specified by the user in Result field. Individual RC components are stored in matrix with name obtained by prefixing "mat_" to Result name, and can be accessed in Data I/O tool. The Result field is shared with the SSA gap-filling feature.

If results from several reconstructions have been stored in different vectors, the parameters for a particular reconstruction will be restored in GUI by simply selecting correspondent vector from Result pop-up list. For prediction user need to specify lead time in Lead field, and order of AR model to advance in time selected SSA components in Prediction Options panel; cross-validation is available as well for finding optimal parameters for prediction, see SSA prediction for more details. Checking "fliter out" box will filter out the selected components from the orginal data (prediction feature is disabled then); this can be quite useful for detrending.

Selecting the 1-4 table rows of identified 'significant' components , and after hitting Compute in Reconstruction/Prediction box, followed by Plot, next figure will be obtained (after adjusting its parameters in Graph Controls window):

Prediction

SSA Reconstruction.


Here the reconstructed signal is plotted against original time series. If this SSA-filtered RC series is now analyzed by MEM or MTM, the frequency spectrum of the RCs is much simpler than that of the complete time series, since most of the "noise" has been removed.
We leave as an exercise for the user to repeat SSA analysis with 11 SSA Components, and try to identify pair 10-11 with the oscillatory signal. Hint: this is related to the peak in
MTM Spectrum at f~0.18 cycles/month.