3. TOOLKIT DEMONSTRATION

SSA Time Series Prediction

(Note: In a demo version SSA Reconstruction/Prediction and Cross-validation are available only for data in example projects. In a licensed copy these features are enabled after activation with a purchased Serial No.).

A combined SSA-MEM method can be used for time series prediction. It is started by performing SSA filtering with window M on input time-series to identify SSA "signal" components, i.e. containing oscillatory modes and/or trends. Next, we fit AR model to "signal" PCs and advance them in time to produce PCs forecasts. Finally, SSA reconstruction is performed that takes into account PCs forecasts. For small orders of AR model, forecast errors are dominated by the lack of resolution; for large orders, by the error made on the coefficients of the model. The order M is most consistent with SSA analysis, but it can be too large: the variance of the AR coefficient estimates increases with the order; M is a default value in kSpectra. Cross-validation (discussed below) can be used to identify the optimum SSA window M and AR order for prediction. Check MSSA prediction for example with multivariate data.

Here we demonstrate Toolkit capabilities to reconstruct and predict quasi-periodic oscillatory signal contaminated by noise, following prediction.tkt project in Examples/SSA Prediction folder of kSpectra distribution. The synthetic test series, 600 data points long, consists of low-frequency oscillation with a period of T=40 units. This oscillation is modulated both in amplitude and phase with period of T=120, and is contaminated by large amplitude white noise.

Our task will be to reconstruct and predict signal (red) from a signal+noise (blue) series. We will use Cross-Validation available in SSA Prediction Options to find optimal number of components for prediction.

We start kSpectra, and go to Data I/O in Tools. Using Finder, we double clik project file prediction.tkt in SSA Prediction.

Then we go to SSA in Tools, select ndata1.dat from Data Pop-up menu, change the Window value to M=120 (longest periodicty in the time-series), and set name in Spectrum box to ssa1. Then click Compute, followed by Plot, to obtain SSA spectral estimate:

Low-frequency part of the spectrum contains a few oscillatory pairs which correspond to signal. Click Advanced options, change name in Result field to ssarc, and select leading nine rows of SSA components table corresponding to four oscillatory pairs of the signal and trend.

Click Prediction options and set there at the top forecast Lead time to 80, and AR order to 120.

Then we click Compute followed by Plot to obtain Figure below:

prediction

SSA prediction

If SSA cross-validation has been performed (see below), forecast will include the confidence levels. Cross-validation is useful to access the quality of SSA forecast and optimal parameters for prediction. The basic idea is that we consider shortened time series, perform SSA forecast and compare our prediction with original record (which we know) at particular lead time.


Cross-validation for SSA prediction.

Top of Prediction options panel has fields for Lead time and order of AR model for actual prediction. The settings for performing cross-validated forecast in the middle of the panel can be used to 1) find optimum SSA parameters for prediction and 2) provide estimates of the forecast skill. We will do it for ndata1.dat (signal+noise) time series; please select it in the Data box of SSA tool, and set number of SSA components to 20, and SSA window to 120. Proceed to Advanced and then Prediction Options and set there parameters as follows:

Start and End values of cross-validation interval to 360 (which by default is 3xSSA window size =120 in our case) and 520 ( it is set automatically as the time series length [600]-Lead [80] = 520), respectively. Click Compute button to calculate cross-validated forecast in this interval as a function of the lead time (up to it's maximum specified in Lead at the top of Prediction Options) AND number of retained SSA components (up to it's maximum defined on the main SSA panel). The results of cross-validation are stored as Forecast matrix for various lead times and number of SSA components retained. In addition, forecast RMS error, anomaly correlation, as well as and Verification time series are stored as matrices with names obtained by prefixing "rms_", "cor_", and "dat_" to Forecast name, and can be accessed in Data I/O tool. Using Plot options, user can see the mean forecast skill averaged over all values of lead as a function of retained SSA components. Alternatively, user can choose to see the skill as a function of lead time for a particular number of retained SSA modes, as well as compare forecast and verification time-series (Verification option).

Results of cross-validated forecast skill for our signal+noise data are shown below:

There is a maximum (minimum) of Correlation (RMS error) for 8-9 retained SSA components containing signal. It can be useful to vary SSA window size or/and AR order and repeat cross-validation to see if improvement in forecast skill can be obtained. To compare cross-validated time series of our SSA forecast with the data, we choose Verification option, set at the bottom of Prediction Options panel lead time to 20 and number of SSA components to 9, and hit Plot:

We see that our prediction generally follows the low-frequency signal, capturing quite well both it's phase and amplitude. Red confidence intervals are based on +-RMS cross-validation error at the chosen lead time. The validated time-series is bound very well within confidence levels.

Feel free to experiment with different AR order and compare prediction results. Results of SSA prediction, naturally are dependent on amount of noise in the data, and improve as the amount of noise in the dataset is reduced as in the figure below (see SSA Prediction example project).