Аннотация:Different approaches for batch scheduling in multiprocessors system presented in this paper. In the experimental big data processing framework authors used a novel graph-based strategy for optimal data locality usage in HDFS and several data-driven heuristics concepts to define the order of tasks execution in batch and to allocate resources optimally. The authors explained key principals of building the graph and its optimization with Dinic’s algorithm and bi-criteria linear search. The model of real network topology (distributed storage, switches, system cores) was designed and developed. Described the set of metrics for evaluating the efficiency of each strategy (time metrics, CPU metrics, idle metrics). Performance on the real test case and cores utilization comparison with other scheduling approaches (greedy and non-historical based) are also provided. Visualization for current processing was designed and implemented in algorithms to analyze bottlenecks (idle time, efficiency capacity). There is also an explanation of parametrization tuning process for data-driven heuristics and how to achieve the best performance with the presented approach.