
[minimum cutoff][maximum cutoff][zeroing values][min. sd cutoff][n-level vars.][autoscale][BUW scale][D-opt. exclude][FFD exclude][Transforms][Cut out]
Pretreatment>>>Advanced Pretreatment>>>minimum cutoff
The tool asks the User for a minimum cutoff value, that can be different for each X-variables type. Values lower than the minimum are then replaced in the data file by the cutoff value. In QSAR 3D, applying minimum cutoff is not as common as applying maximum cutoff and should be applied only when the User has a good reason.
The dialog window works in an interactive way. The User can select a tentative cutoff by sliding the scale. Immediately the program shows on the right the number of values that will be truncated if these cutoff were applied. It is possible to repeat the process until the User is satisfied with the values chosen.
Please note that the scale can be moved dragging the mouse, but also can be displaced with the right and left arrow of the keyboard to make very precise step changes.

The2D Plot button shows a histogram of the distribution of the X-values, and the Profile button a profile of the X data. Both plots can be helpful in selecting the appropriate minimum cutoff value.
Only when the OK button is pressed the cutoff will be applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>maximum cutoff
This tool asks the User for a maximum cutoff value, that can be different for each X-variables type. Values higher than the maximum are then replaced in the data file by the value of the cutoff.
Defining an appropriate cutoff is of critical importance in 3D-QSAR studies. Data coming from grid analyses are usually obtained from molecular mechanic calculations. In this methodologies, the energy of interaction calculated in grid points close to the Van der Waals surface of the molecules were given highly positive (repulsive) values. This makes sense from a purely theoretical point of view but, when building the model, these high values will artificially increase the variance of a few variables which in fact contains little information, thus leading to undesirable effects. In this situation, highly positive energies should be truncated to a value of 30 Kcal/Mol according to some authors or to 5 Kcal/Mol according to some others. Our suggestion is to truncate the highly positive values in order to make the overall data distribution more symmetrical. For example, if the minimum value in the data file is of about -10 Kcal/Mol, it is convenient to truncate the positive values to 10 Kcal/Mol.
The dialog window works in an interactive way. The User can select a tentative cutoff by sliding the scale. Immediately the program shows on the right the number of values that will be truncated if these cutoffs were applied. It is possible to repeat the process until the User is satisfied with the values chosen.
Please note that the scale can be moved dragging the mouse, but also can be displaced with the right and left arrow of the keyboard to make very precise step changes.

The2D Plot button shows a histogram of the distribution of the X-values, and the Profile button a profile of the X data. Both plots can be helpful in selecting the appropriate maximum cutoff value.
Only when the OK button is pressed the cutoff will be applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>zeroing values
Small values close to zero tend to introduce noise in the data files and it is appropriate to set these values to 0.000 (zeroing), since this range is very narrow as compared to the energy domain of 20 Kcal.
This tool allows the User to perform the zeroing of small values. First the User is asked about the kind of variables that are going to be zeroed:
Select positive and/or negative values by clicking in the check boxes and then press the OK button, or press the Cancel button to abort the operation.
Then a new dialog window is presented, in which the User can select the absolute critical value. Values having an absolute value lower than the critical value will be substituted by 0.000. To select the critical value for each block of variables, simply slide the scale. The program will report immediately the number of variables that would be zeroed if this value were applied. The User can select values from 0.01 up to 0.50. We suggest to use smaller cutoff values for the negative (attractive) values (say 0.1) than for the positive (repulsive) values (say 0.5).
Please note that the scale can be moved dragging the mouse, but also can be displaced with the right and left arrow of the keyboard to make very precise step changes.

The2D Plot button shows a histogram of the distribution of the X-values, and the Profile button a profile of the X data. Both plots can be helpful in selecting the appropriate zeroing value.
Only when the OK button is pressed the zeroing will be applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>min. sd cutoff
Variables with a low standard deviation (sd) are variables showing very small variance in the data file.
This tool asks the User for a minimum sd cutoff value, which can be different for each X-variables type. Standard deviations which are lower than the minimum weight are then set to inactive variables. This option corresponds to the MINIMUM_SIGMA cutoff in SYBYL/CoMFA.
The dialog window works in an interactive way. The User can select a tentative cutoff by sliding the scale. Immediately the program shows on the right the number of variables that will became inactive if these cutoffs were applied. It is possible to repeat the process until the User is satisfied with the values chosen.
Please note that the scale can be moved dragging the mouse, but also can be displaced with the right and left arrow of the keyboard to make very precise step changes.
The2D Plot button shows a histogram of the distribution of the X-values, and the Profile button a profile of the X data. Both plots can be helpful in selecting the appropriate minimum sd value.
Only when the OK button is pressed the cutoff will be applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>n-level vars.
The N-level variables are variables which take only a few values and, in addition, have a ill distribution of the objects in these levels. They are dangerous because they force the model to explain most of the variance of a few objects with a high leverage, thus leading to spurious and misleading results.
According to the number of levels and the distribution of values we can distinguish three types of n-level variables.
2-levels variables
Variables which take only 2 values in all the data file, one of which appears only in 1, 2 or 3 objects, following one of these three patterns.
|
level 1 |
level 2 |
|
1 object |
rest of objects (n-1) |
|
2 objects |
rest of objects (n-2) |
|
3 objects |
rest of objects (n-3) |
3-levels variables
Variables which take only 3 values in all the data file, one of which appears only in 1 or 2 objects and a second one only in 1 or 2 objects too, following one of these three patterns:
|
level 1 |
level 2 |
level 3 |
|
1 object |
1 object |
rest of objects (n-2) |
|
1 object |
2 objects |
rest of objects (n-3) |
|
2 objects |
2 objects |
rest of objects (n-4) |
4-level variables
Variables which take only 4 values in all the data file, following one of these three patterns:
|
level 1 |
level 2 |
level 3 |
level 4 |
|
1 object |
1 object |
1 object |
rest of objects (n-3) |
|
1 object |
1 object |
2 objects |
rest of objects (n-4) |
|
1 object |
2 objects |
2 objects |
rest of objects (n-5) |
Not all N-level variables are equally dangerous. 2-levels variables and particularly the 1-N kind of 2-level variables are extremely dangerous and we recommend to remove them from the data file in any case. 3 and 4 levels variables might not have such a big impact on the models and it is up to the User to keep them into the data file or not. In any case we suggest, after the final model is obtained, to check the position of such variables in the model. If such ill distributed variables left in the data had taken the lead of the model, it will be based upon the variation of few objects (molecules) and the result may be doubtful.
This tool is intended to remove the N-level variables from the file in order to avoid ill conditioning of the data. First the User is prompted to select the block or blocks of X-variables to which the analysis will be applied. Each block is identified by a number, as defined in File>>>Type of variables>>>Modify.
Select the check boxes, as appropriate, and then press the OK button. Press the Cancel button to abort the operation.
Then a new dialog window appears, containing the type and number of N-level variables found for the selected blocks of variables. The User can select the kind of N-level variables to remove simply clicking in the check boxes.
Only when the OK button is pressed the cutoff will be applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>autoscale
The autoscale tool will apply autoscaling to the data file. It is possible to autoscale individual blocks of X and Y-variables and also to define weight factors to equalize the importance in the models of the different blocks of variables. Initially all the weights are set to 1.000 (the blocks are not weighted). If the User wants to change the influence in the model of some blocks of variables it is possible to set their weight factors accordingly. That is the case, for instance, when a single variables (like Log P) is included together with a large set of field variables. If all the blocks are given the same importance, most probably the effect of the Log P in the model would be hidden by the effect of the field variables, just because they are much more variables in the field type block.
To autoscale a block of variable click the yes radio button. To change the weight factor enter the new value in the corresponding input field.
IMPORTANT: Change the weight factors only when external information on blocks of variables is available and undoubting. In the absence of prior information it is better to use the BUW scaling.
Only when the OK button is pressed the autoscaling will be applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>BUW scale
This tools allows the User to apply different weight factors to each block of variables in order to give to each one the same initial importance in the models. By default, weights aiming to this goal will be suggested, but the User can customize the suggested values (not advisable). The name BUW states for Block Unscaled Weights.

In this example the block X2 is going to be 30% more weight than the block X1, in order that both blocks had the same importance in the model. It is easy to check the results of this procedure comparing the 3D-loading plots of models made with and without BUW.
BUW coefficients are obtained equalizing the sum of squares of each block of variables with the following results: inside each block the variables are not autoscaled but all the blocks are given the same initial importance.
Only when the OK button is pressed the BUW scaling will be applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>D-opt. exclude
Pretreatment>>>Advanced Pretreatment>>>FFD exclude
This tools exclude from the data file the variables not selected in the D-optimal preselection or the F. Factorial variable selection procedure. Both work exactly as it is described in the Pretreatment>>>Delete unselected var.s command. The reason to include these tools here is to allow the User to save data files after the variable selection, and so make unnecessary to apply the Pretreatment>>>Delete unselected var.s command each time the data file is loaded.
Pretreatment>>>Advanced Pretreatment>>>Transforms
This tool is intended to carry out numerical transformations in some variables of the data files. In particular, the transformations include some of the most usual transformations required for the dependent variables (Y).
The transformations are applied to all the variables in a given block and therefore the first step is to choose the block or blocks of variables to which the transformations will be applied. By default, GOLPE selects the Y blocks.
Then, the User has to choose the transformation to apply:
The choices are:
| Transform | Each variable (x) in the block will be replaced by | |
| logarithm (pK) | base-10 logarithm of its inverse, | log(1/x) |
| logarithm (log) | its base-10 logarithm, | log(x) |
| inverse (1/x) | its inverse, | 1/x |
| exponential (e^2) | its exponential, | ex |
Only when the OK button is pressed the transformations are applied to the data file. The Cancel button will close this tool performing no modification on the data.
Pretreatment>>>Advanced Pretreatment>>>Cut out
Molecular Interaction Fields are computed in a rectangular 3D box. Sometimes, not all points in this box are equally important, for example, in a PCA/GRID analysis, the interesting region is in the center, while the corners of this box contains parts of the proteins not well superimposed which introduce a large amount of spurious variation in the data. A different situation arises when using MEP data. This data is computed for every point in the cage, and points very close to the atomic nuclei tend to produce extremely high positive values. In these both examples, it would be appropriate to remove some parts of the grid cage according to simple geometric criteria.
GOLPE 4.5 incorporates a Pretreatment tool which allows the User to define a geometric criteria to remove parts of the cage. In order to define these parts the User must introduce a PDB file containing the coordinates of a template molecule. The positions of every grid node inside the cage are compared with the atoms of the template. If the distance node-atom of the template is within a certain range (defined also by the User) the program will keep this node, thus removing nodes too far away from the template. The criterion can be reversed and then only the points which are far away from the template will be selected.
The command shows a dialog like this:

Find...
Press this button to open a standard file selection dialog. The file selected will be then presented in the input line immediately to the left to the button pressed. See the File>>>Open data file command for details about the file selection dialog.
template (.pdb)
Name of a file containing a valid PDB file describing a template structure. The template structure can correspond with a real structure, a set of structures or a bunch of dummy atoms. GOLPE only pays attention to the atomic coordinates and therefore the elements, residues, etc... can be completely casual. No connectivity section is required. There is no formal limit to the size of this file, but filtering large grid cages using very large PDB files can take some time.
select points:
If this control is set to within (default), the grid nodes within the selected radius will be preserved and the external nodes will be removed. On the contrary, if the control is set to outside, the grid nodes within the selected radius will be removed and only the external nodes will be present in the dataset.
radius:
The value of this radius defines the cutoff for preserving or removing grid nodes. When a grid node is farther than this radius from any atom of the template structure it will be removed/preserved. The data entered should be a positive numerical value higher than 0.000
Only when the OK button is pressed the transformations are applied to the data file. The Cancel button will close this tool performing no modification on the data.