Аннотация:This chapter demonstrates the danger of the variable selection bias and the need for external cross-validation for correct assessment of the prediction performance of QSAR models based on automatically selected descriptors. The n-fold cross-validation technique is widely used to estimate the performance of QSAR models. In this procedure, the entire dataset is divided into n non-overlapping pairs of training and test sets. The models obtained on each fold of the internal CV are based on the same set of descriptors, whereas corresponding models for the external CV may involve different descriptors. In order to select descriptors (attributes in Weka's terminology), one should specify a descriptor search algorithm and specify a way to compute the value being optimized in the course of descriptors selection. The default setting for descriptor search is to use the BestFirst algorithm.