ИСТИНА |
Войти в систему Регистрация |
|
Интеллектуальная Система Тематического Исследования НАукометрических данных |
||
In this article we describe the Octotron project intended to ensure re-liability and sustainability of a supercomputer. Octotron is based on a formal model of computing system that describes system components and their inter-connections in graph form. The model determines relations between data de-scribing current supercomputer state (monitoring data) under which all compo-nents are functioning properly. Relations are given in form of rules, with the in-put of real monitoring data. If these relations are violated, Octotron registers the presence of emergency situation and performs one of the predefined actions: notification of system administrators, logging, disabling or restarting faulty hardware or software components, etc. This paper describes the general struc-ture of the model, augmented with details of its realization and evaluation at su-percomputing center in Moscow State University.