ИСТИНА |
Войти в систему Регистрация |
|
Интеллектуальная Система Тематического Исследования НАукометрических данных |
||
INTRODUCTION: Metabolomics data often contain thousands of features, but only some of them keep useful information about clinical status and other types of system biology source of data. The one of the first step to the realization of global concepts (such as personalized medicine and system biology) is design a list of the most stable and robust approaches to the extraction of informative metabolites. OBJECTIVES: The primarily aim of our research is attempt to employ machine learning principle for selection important features from metabolomics data without powerful and not stable preprocessing stages (such as QC-based correction, scaling, transformation, decompositions, etc.). We applied only creatinine normalization and half-minimum missing value imputation to raw data. METHODS: LC-MS analysis of 40 urine samples was performed by C18 column (Waters) coupled with IT-TOF (Shimadzu) instrument. The metabolites data table after integration and alignment was obtained from iMet-Q software. All calculations for model training, resampling, tuning hyperparameters, variables importance sorting, feature frequency computing between different stage of resampling and recursive feature elimination were done by R environment (caret package in generally). Other computations were also produced throughout R software. The obtained pipeline of data engineering process was tested on one open repository metabolomics data. RESULTS: In all datasets (experimental and from open repository) clinical groups were clearly and properly separated by hierarchical cluster analysis and principal component analysis. Correct pattern recognition was achieved for reduced datasets after feature selection based on combination of machine learning training and results of univariate analysis. CONCLUSION: This report slightly demonstrate potential opportunities to creation and validation of some useful approaches for marker research in high dimensional data. Combination the efforts of many researchers can led to the adoption of more rational and robust techniques then the classical methods (ANOVA, FCA, PCA, VIP score from s,o – PLSDA), especially for non-linear and complex issues. This work was funded by the Russian Foundation for Basic Research (RFBR), according to the research project No. 19-33-90071.
№ | Имя | Описание | Имя файла | Размер | Добавлен |
---|---|---|---|---|---|
1. | 2019_EU_19a_plyush1993_40179_poster.pdf | 2019_EU_19a_plyush1993_40179_poster.pdf | 583,6 КБ | 4 октября 2019 [Plyush1993] |