Местоположение издательства:Road Town, United Kingdom
Первая страница:924
Последняя страница:929
Аннотация:The issues of recognition of special genome structural segments called promoters are studied. To solve the gene promoter prediction problem machine learning methods based on logical analysis and classification of data are used for the first time. These methods are based on searching for informative fragments in feature descriptions of precedents and are focused on processing low-value integer information. The fragments found are well interpretable and allow distinguishing promoters from other regions of the genome. However, their search is time-consuming. The gene promoter prediction problem is studied using the model organism of Drosophila melanogaster. The results of experiments on an unbalanced sample of a large volume are presented. Both the traditional method of feature formation using k-mers and the method of direct application of the classifier to the original data are considered. It is shown that in the second case, the quality of logical classification is significantly higher. A number of other well-known machine learning algorithms, such as random forest, logistic regression, and various gradient boosting models, were also participated in the experiments. The best result was shown by the CatBoost classifier when directly applied to the original data.