Chapter 10. Cross‐Validation and the Variable Selection Bias

Авторы: Baskin Igor I., Gilles Marcou, Dragos Horvath, Alexandre Varnek
Сборник: Tutorials in Chemoinformatics. First Edition. Edited by Alexandre Varnek
Глава в коллективной монографии
Год издания: 2017
Место издания: John Wiley & Sons Ltd United Kingdom
Первая страница: 163
Последняя страница: 173
DOI: 10.1002/9781119161110.ch10
Аннотация: This chapter demonstrates the danger of the variable selection bias and the need for external cross-validation for correct assessment of the prediction performance of QSAR models based on automatically selected descriptors. The n-fold cross-validation technique is widely used to estimate the performance of QSAR models. In this procedure, the entire dataset is divided into n non-overlapping pairs of training and test sets. The models obtained on each fold of the internal CV are based on the same set of descriptors, whereas corresponding models for the external CV may involve different descriptors. In order to select descriptors (attributes in Weka's terminology), one should specify a descriptor search algorithm and specify a way to compute the value being optimized in the course of descriptors selection. The default setting for descriptor search is to use the BestFirst algorithm.
Добавил в систему: Баскин Игорь Иосифович

	ИСТИНА	Войти в систему Регистрация
	Интеллектуальная Система Тематического Исследования НАукометрических данных
	Главная Поиск Статистика О проекте Помощь

ИСТИНА