Pretreatment

Alt-t

 

Pretreatment menu

[Advanced Pretreatment][Classic Pretreatment][Delete unselected var.s (D-optimal)][Delete unselected var.s (F. Factorial)][Delete specific var.s][Reload original variables]

 


Pretreatment>>>Advanced Pretreatment

 

[minimum cutoff][maximum cutoff][zeroing values][min. sd cutoff][n-level vars.][autoscale][BUW scale][D-opt. exclude][FFD exclude][Transforms][Cut out]

 

Data pretreatment is a very important step in any 3D-QSAR analysis, since the results depends dramatically on the cutoff values, on removing dangerous variables, on the scaling strategy, etc... The commands Pretreatment>>>Classic Pretreatment offer a simple way to apply a basic pretreatment to the data file. The Advanced pretreatment tool offers a much better alternative and it has been specifically designed to deal with grid field energies data.

 

It follows an interactive/sequential philosophy:

 

It can be used to perform pretreatment, but also to open a window on the data in order to obtain information. The Advanced pretreatment offers to the User a lot of information about the data which cannot be obtained in any other way in GOLPE. In particular, for each block of variables, the User can obtain:

 

The Advanced pretreatment can be seen as an independent application by itself. It works always on a copy of the data file loaded in GOLPE. The User can perform one or many pretreatment operations, one after the other and, at the end, the modified data can be stored in a new data file in SIMCA/GOLPE format. If the data is not saved, all the changes are lost, because no changes are applied to the data currently loaded in GOLPE.

 

IMPORTANT: The only way to use the pretreated data is saving the data in a new file and loading it back with the File>>>Open data filecommand.

Our advice is to use to the Advanced pretreatment always as the first step of the analysis. Even if no pretreatment is performed, the information obtained can help the User to understand much better his problem and the relative models obtained in later stages.

 

The command opens a dialog window which contains three areas

 

Advanced Pretreatment dialog

 

1. A scrolled text area similar to the main window in GOLPE. The program displays in this area information about the data file at startup and after each single pretreatment operation. All the information displayed in this window is also written in a log file named

filename.dat.PretreatToolLog.

 

2. A button column. Each button opens a tool, which performs single pretreatment operations on the data. Click on the links to see a detailled description of each tool:

[minimum cutoff][maximum cutoff][zeroing values][min. sd cutoff][n-level vars.][autoscale][BUW scale][D-opt. exclude][FFD exclude][Transforms][Cut out]

 

3. A button row at the bottom of the dialog:

Save Stores all the modifications performed on the data in a new data file, in SIMCA/GOLPE format.
Exit Exits from the Advanced pretreatment without saving the modified data. If the data were not previously stored, the User will be prompted to confirm the exit.
Reload A new copy of the data file currently loaded in GOLPE will be copied to the Advanced pretreatment. All the modifications applied to the data will be lost.

 


 

Pretreatment>>>Classic Pretreatment

 

Classic Pretreatment Submenu

 


 

Pretreatment>>>Classic Pretreatment>>>Set-up pretreatment

 

This command allows the User to define how the data will be pretreated. Please notice that GOLPE will automatically apply pretreatment to any data files, before to use them for modeling, variable selection, plots, etc... If the User does not explicitly define the kind of pretreatment operations to be used the program will apply a default (dummy) pretreatment.

 

We recommend to use this command to make simple pretreatments, and the advanced pretreatment included in Pretreatment>>>Advanced Pretreatment for more complete ones.

 

A dialog window like this is presented:

 

Classic Pretreatment Dialog

 

autoscale

Select yes to apply autoscaling to this block of variables, or keep it unselected to leave the data unchanged. By default the program does not autoscale the data.

Autoscaling variables might have a very large influence on the quality of the model. We suggest that the User does not autoscale the data unless he is sure that it is strictly appropriate.

 

factor

A coefficient that weights this block of variable. Initially all the weights are set to 1.000 (weighting for blocks of variables is not applied). If the User wants to change the influence in the model of some blocks of variables it is possible to set their weight factors accordingly. This is the case, for instance, when a single variables (like Log P) is included together with a large set of field variables. If all the blocks are given the same importance, most probably the effect of the Log P on the model would be hidden by the effect of the field variables, just because there are many more variables in the last block.

 

min value

The minimum acceptable value in this block of variables. If any value, for any of the variables included in this block, is lower than this minimum, the value is truncated and replaced by the value of the minimum.

The default value is the minimum value present, for any of the variables included in this block, in the data file.

 

max value

The maximum acceptable value in this block of variables. If any value, for any of the variables included in this block, is higher than this maximum, the value is truncated and replaced by the value of the maximum.

The default value is the maximum value present, for any of the variables included in this block, in the data file.

 

min SD

The minimum acceptable standard deviation of any variable belonging to this block. All variables in this block having a lower standard deviation will be considered inactive and will not be used in further calculations.

 

 

When the button Apply is pressed, all the settings in this dialog window are applied to the data file and saved as a new default for it. Notice that from this moment, and unless the settings were explicitly changed, this pretreatment will be always applied to the data, even if the data file is reloaded. If the User press the button Cancel no change is performed to the data nor to the actual pretreatment setting. Press the button Clear to fill all the options in this dialog window with the default values.

 


 

Pretratment>>>Classic Pretreatment>>>Set-up pretreatment (BUW)

 

This command allows the User to define how the data will be pretreated. Please note that GOLPE will automatically apply pretreatment to any data files, Before to use them for modeling, variable selection, plots, etc... If the User does not explicitly define the kind of pretreatment operations to be used the program will apply a default (dummy) pretreatment.

 

We recommend to use this command to make simple pretreatments, and the Advanced pretreatment included in Pretreatment>>>Advanced Pretreatment for more complete ones.

 

A dialog windows like this is presented:

 

Classic Pretreatment Dialog II

 

This command is active only in the presence of more than one block of variables. When it is selected GOLPE calculates block weights which would result in giving the same initial importance to each blocks. The name BUW states for Block Unscaled Weights.

 

BUW

A coefficient that weights this block of variables. BUW coefficients are set automatically to give the same relative importance to each block of variables. They can be modified by the User to a different value, although this is not advisable. BUW coefficients are obtained equalizing the sum of squares of each block of variables with the following results: inside each block the variables remain unchanged with respect to each other but all the blocks are given the same initial importance.

 

min value

The minimum acceptable value in this block of variables. If any value, for any of the variables included in this block, is lower than this minimum, the value is truncated and replaced by the value of the minimum.

The default value is the minimum value present, for any of the variables included in this block, in the data file.

 

max value

The maximum acceptable value in this block of variables. If any value, for any of the variables included in this block, is higher than this maximum, the value is truncated and replaced by the value of the maximum.

The default value is the maximum value present, for any of the variables included in this block, in the data file.

 

min SD

The minimum acceptable standard deviation of any variable belonging to this block. All variables in this block having a lower standard deviation will be considered inactive and will not be used in further calculations.

 

 

When the button Apply is pressed, all the settings in this dialog window are applied to the data file and saved as a new default for it. Notice that from this moment, and unless the settings were explicitly changed, this pretreatment will be always applied to the data, even if the data file is reloaded. If the User press the button Cancel no change is performed to the data nor to the actual pretreatment setting. Press the button Clear to fill all the options in this dialog window with the default values.

 


 

Pretreatment>>> Delete unselected var.s (D-optimal)

Pretreatment>>> Delete unselected var.s (F.Factorial)

 

IMPORTANT: the D-optimal selection list or the F. Factorial selection list will appear only after performing a variable selection procedure.

 

These commands are used for removing temporarily from the data file the variables unselected by the D-optimal Preselection or by the F. Factorial Variable Selection procedures. None of these methods does actually delete any variable from the data: they mark the variables as 'inactive' so they will not be considered in further analysis, but the data file stored on disk is not changed.

 

A dialog window like this is presented (the aspect is slightly different for F. Factorial)

 

Delete Unselected (D-optimal)

 

D-optimal selection list

F. Factorial selection list

This list contains all the previous variable selection procedures performed on this data file. Each procedure is identified by a sequential number, the initial number of variables, the final number of variables, the number of components and the hour and data when the procedure was finished. Click on any item to select it. The selected items will be included in the Selection input field.

 

Selection:

The selection from the above list is shown in the Selection input field.

 

When the OK button is pressed, the variable selection procedure chosen will be read and all the variables not selected in this procedure will be temporarily considered inactive. All these variables will be ignored by GOLPE until the command Pretreatment>>>Reload original variables is executed or the data file is reloaded from disk.

 

The number of X-variables that remain active after the deletion will be shown in the main window for reference.

 


 

Pretreatment>>>Delete specific var.s

 

This option allows the User to specify some variables to be removed from subsequent analysis. Variables are not actually removed from the file and the effect of this command can be reverted using Pretreatment>>>Reload original variables. This procedure is not suitable for large datasets (more than 100.000 variables).

The comand opens this dialog:

 

Delete specific var.s dialog

 

Variables

This list contains all the variables in the data set. By default, all variables are marked as "keep". Click on any variable name for toggle the state from keep to remove and viceversa.

 

Tools

Click on keep all button to mark all variables as variables to keep in the data set. Click on remove all button to mark all the variables as variables to remove from the data set.

 

When the OK button is pressed, the variables marked as remove will be temporarily considered inactive. All these variables will be ignored by GOLPE until the command Pretreatment>>>Reload original variables is executed or the data file is reloaded from disk.

 

The number of X-variables that remain active after the deletion will be shown in the main window for reference.

 


 

Pretreatment>>>Reload original variables

 

This option will reload the original variables, thus reverting the effect of the commands which delete unselected variables. The number of X-variables that are active after the reloading will be shown in the main window for reference.