Study of Scheduling Approaches for Batch Processing in Big Data Cluster

Timokhin, I.; Teplov, A.

Авторы: Timokhin Ilya, Teplov Aleksey
Сборник: Lecture Notes in Computer Science
Год издания: 2022
Издательство: Springer International Publishing
Местоположение издательства: New York
Первая страница: 533
Последняя страница: 547
DOI: 10.1007/978-3-031-22941-1_39
Аннотация: Different approaches for batch scheduling in multiprocessors system presented in this paper. In the experimental big data processing framework authors used a novel graph-based strategy for optimal data locality usage in HDFS and several data-driven heuristics concepts to define the order of tasks execution in batch and to allocate resources optimally. The authors explained key principals of building the graph and its optimization with Dinic’s algorithm and bi-criteria linear search. The model of real network topology (distributed storage, switches, system cores) was designed and developed. Described the set of metrics for evaluating the efficiency of each strategy (time metrics, CPU metrics, idle metrics). Performance on the real test case and cores utilization comparison with other scheduling approaches (greedy and non-historical based) are also provided. Visualization for current processing was designed and implemented in algorithms to analyze bottlenecks (idle time, efficiency capacity). There is also an explanation of parametrization tuning process for data-driven heuristics and how to achieve the best performance with the presented approach.
Добавил в систему: Теплов Алексей Михайлович

	ИСТИНА	Войти в систему Регистрация
	Интеллектуальная Система Тематического Исследования НАукометрических данных
	Главная Поиск Статистика О проекте Помощь

ИСТИНА

Интеллектуальная Система Тематического Исследования НАукометрических данных

Study of Scheduling Approaches for Batch Processing in Big Data Clusterстатья Исследовательская статья