Utilities

Alt-U

 

Utilities Menu

[Add new variables][PCA-predictions][PLS-predictions][Extract objects][Merge two files]

 

This menu give access to miscellaneous commands, which perform some operations not included in the standard procedure of data analysis.

 


 

Utilities>>>Add new variables

 

This command adds one or more variables (max 10) to the data set. First a dialog window is presented, in order to choose the number of new variables to add:

 

Add Variables Dialog I

 

Once the number of new variables is introduced, press the OK button, or the Cancel button to abort the operation. A new dialog window is presented:

 

Add Variables Dialog II

 

To add the new variables follow this procedure:

  1. Enter the values of the 3 new variables for the first object into the var 1, var 2 and var 3 input fields. If no value is entered a value of 0.000 will be assumed by default.
  2. Click the first up and down arrow buttons to change to the next object. The number and the name of the active object appear in the Object: field.
  3. Repeat this operation for every object in the data file.

 

 

When the operation is completed press the OK button, or the Cancel button to abort the operation. The new variable/es will be added at the end of the dataset and included in a block labeled as "ADDED UNKNOW KIND" and marked as of type "not use" (N).

 

IMPORTANT: Don't forget to update the type of variables using the File>>>Type of variables>>>Modify command.

 


 

Utilities>>>PCA-predictions

 

A PCA model provides a simplified representation of the original X matrix as the product of two matrices: a loading (P) matrix and a scores (T) matrix. The latest can be used for detecting the structure of the objects in terms of clustering, similarities, outlier detection, etc...

It is possible to apply a certain PCA model to a external dataset (X*), in order to obtain "predicted" scores (T*) that can be used to obtain scores plots representing both the original series (T) and the external objects (T*). This representation can be seen as a projection of the external series in the same dimensionally-reduced space obtained for the original series, with a rotation defined by the original loading matrix (P).

 

In the original PCA model:

X = TP' + E

for a external dataset X*

X*=T*P' + E*

and using the NIPALS algorithm, for a certain dimension a:

ta*=X*pa/pa'pa

 

In GOLPE, the predicted scores can be used to obtain mixed scores 2D plots and 3D plots (using plot>>2D plots>>PCA-scores and plot>>3D plots>>>PCA-scores). It is also possible to list the values of the scores using list>>>PCA-scores ext. pred.

 

First, the User is prompted about the cutoffs to apply to the external data file.

 

PLS Predictions Dialog

 

If the User press the Yes button, the same cutoff which appear in the Pretreatment>>>Classic Pretreatment>>>Set-up pretreatment will be applied. Please note that, even if the User has never used this command, there is a default maximum and minimum cutoff which correspond to the maximum and minimum values in the current data file. If the User press the No button, no cutoff will be applied to the external data file.

 

Then a standard file selection dialog will be opened and the User will be asked to select the external data file. See the File>>>Open data file command for details about the file selection dialog.

Once the selection was made the GOLPE will list in the main window the percentage of the sum of squares (as %SS) explained for each object and for each model dimensionality. At the end, it is also listed a resume of the total percentage of sum of squares explained for each model dimensionality, (as SSExp and accumulated as SSAcum) referred to all the external set.

 

External PCA predictions for /disk2/people/manolo/golpe/prb/FFD.dat

% SS explained for each object and model dimensionality

1    d1a      27.75    53.00    83.76    83.78    95.72
2    d2a       0.89    10.43    67.26    71.03    72.67
6    d3a      43.39    72.35    89.43    90.37    90.69
10   d4a      55.30    57.50    91.28    92.97    94.16
14   d5a      70.95    74.27    82.88    86.43    93.55
16   d6a       0.13     1.27    57.45    60.94    95.75
20   d7a      48.10    53.62    64.73    71.87    84.11
24   d8a       8.85    44.42    69.96    80.41    81.13
26   d9a      55.16    60.05    60.31    84.45    91.85
30   d10a     31.86    40.81    51.21    86.77    97.58
32   d11a     29.88    31.59    39.30    92.04    95.43
36   d12a      6.41    80.44    93.73    94.50    97.95
40   d13a     33.81    61.57    70.95    83.42    88.23
41   d14a      8.36    70.09    71.12    71.51    97.19
45   d15a     40.15    70.65    78.25    83.80    85.30


      components     SSExp     SSAccum
          1        30.3948     30.3948
          2        24.2823     54.6771
          3        17.8427     72.5198
          4        10.0963     82.6160
          5         8.9537     91.5697


 

Utilities>>>PLS-predictions

 

Once the PLS model is built it is possible to apply the model to any external data file in order to predict the activity of different molecules. This option is also useful to check the model predicte power on an external validation set, besides the self-consistency of the SDEP procedure along the stepwise process of variable selection. Please note that the type and the number of variables in the external data file must exactly match the type and number of variables in the data file used to generate the PLS model. This also applies to the Y-variables, and therefore, when they are unknown any numerical values should be introduced in the external data file.

 

First, the User is prompted about the cutoffs to apply to the external data file.

 

PLS Predictions Dialog

 

If the User press the Yes button, the same cutoff which appear in the Pretreatment>>>Classic Pretreatment>>>Set-up pretreatment will be applied. Please note that, even if the User has never used this command, there is a default maximum and minimum cutoff which correspond to the maximum and minimum values in the current data file. If the User press the No button, no cutoff will be applied to the external data file.

 

Then a standard file selection dialog will be opened and the User will be asked to select the external data file. Once the selection was made the GOLPE will show in the main window the predicted Y-values, for each PLS model dimensionality. See the File>>>Open data file command for details about the file selection dialog.

 

Additionally, in order to use the external data file for validating the PLS model, GOLPE will use the Y-variables provided to calculate the SDEP (external), component by component.

 

SDEP (external) Standard Deviation of Error of Predictions (external).

 

Y : Y-value in the external data file.

Y' : Predicted value.

: Average value.

N : Number of objects.

 

If the Y-variables for the external data file are unknown and the User introduced just dummy values, the SDEP value has no meaning and can be ignored.

 


 

Utilities>>>Extract objects

 

This command extracts some objects from the data file and stores them in a new data file, in SIMCA/GOLPE format. This command is useful to remove outliers and to split a data file in a training set and an external prediction set.

 

Objects Extraction Dialog

 

Unselected objects:

This list contains all the objects which are not to be included in the new data file. By default all the objects in the current data file are included in this list.

 

Selected objects:

This list contains all the objects which will be included in the new data file. By default no objects in the current data file are included in this list.

 

 

When the Selected objects list contains all the objects to be included in the new file, press the OK button. Then a standard file selection dialog will appear and the User will be prompted to provide a name for the new file. See the File>>>Open data file command for details about the file selection dialog.

 

To use the new data file it must be loaded using the File>>>Open data file command.

 


 

Utilities>>>Merge two files

 

This commands allows the User to create a larger data file by adding the contents of a second data file. If the second data file contains the same number of objects, it will be added "side by side", adding new variables to the currently open data file. If the second data file contains the same number of variables it will be added to the bottom, adding new objects to the currently open file. If the second data file does not have the same number of variables neither the same number of objects the merging operation is aborted.

 

First the user is prompted to select the new data file, using a standard file selection dialog. Once the user has selected the file name, GOLPE reads the new file.

 

If the selected file has the same number of variables, GOLPE presents this dialog:

 

{short description of image}

 

If the selected file has the same number of objects, GOLPE presents this dialog:

 

{short description of image}

 

In either case, if the User press the OK button, the new variables or objects will be added to the current data file, that will be automatically reloaded and updated. If the User press the Cancel button the operation is aborted and no change is applied to the current data file.