3. TOOLKIT DEMONSTRATION

-Univariate Gap filling by SSA

(Note: In a demo version SSA Gap-Filling is available only for data in example projects. In a licensed copy this feature is enabled after activation with a purchased Serial No.).

A novel, iterative form of SSA is used to analyze univariate datasets with uneven sampling or missing observations. Gaps are filled-in by utilizing temporal correlations in the dataset. File data with "NaN" values (case insensitive) are treated as missing. Gap-filling feature is available in Advanced options of SSA panel. Please see MSSA Gap-filling for multivariate datasets.

The user needs to select the data in the Data pop-up menu of SSA tool, specify the SSA window size (large enough to cover longest temporal correlations; it can be the largest temporal periodicity in the dataset), and the number of SSA components. Then gap-filling can be done just by clicking Compute in Gap-filling box of Advanced options. The filled-in data is stored in the vector with a name specified in Result box.

gap filling

By clicking Plot, user can compare the gappy and dataset with missing values filled-in. When plotting missing data, user can select in Preferences option to connect all the available points through gaps:

The number of SSA components one has to use really depends on the dataset, and in particular on the amount of noise present. The main idea is to discard higher-ranked components corresponding to noise. If CVL error box is checked in Gap-filling options, a number of cross-validation experiments is performed (set in Preferences), where a small portion of the existing points is flagged as being missing (in random), and the rms error is calculated for filled-in data. The optimum number of components corresponds to a minimum of such error averaged over all cross-validation sets. The error can be plotted by Plot CVL button. The random seed for choosing the points for cross-validation can be changed in Preferences, as well as convergence criterion for missing values. User can perform such cross-validation experiments for different SSA Window values in order to find optimum parameters for gap-filling. In addition, range of values of filled-in data can be constrained by setting optional Max and Min limits. The percentage of the dataset variance used to fill the gaps is written to Log.

If smooth box is checked in Gap-filling options, then Result will be the estimated smooth component of dataset everywhere, including the points with available data. Otherwise, Result will take values of existing data, and the missing values will be filled-in with the smooth component. If results from several gap-filling calculations have been stored in different vectors, the parameters used (including Preferences) will be restored in GUI by simply selecting correspondent vector from a Result pop-up list.

Here we demonstrate Toolkit capabilities for gap filling on synthetic time series, with and without noise, following Examples/Univariate Gap Filling folder of kSpectra distribution.

First we consider the synthetic test series, 600 data points long, which consists of oscillatory component with a period of T=40 units. This oscillation is modulated both in amplitude and phase with period of T=120. The gaps are created by selecting in random 91% of datapoints. Figure below shows filled-in and gappy datasets.

missing data

By using "draw line through data gaps" in Preferences (see above), user can compare SSA-filled in data with the linear interpolation between available data points:

 missing observations

Next figure shows the almost perfect reconstruction of missing observations by comparing filled-in and the original (full) data.