Аннотация:This paper considers the multi-agent methods applied to the load balancing problem. Most multi-agent methods are decentralised and require a part of network resources to exchange information. This paper proposes a two-layer Multi-Agent method with Reinforcement Learning (MARL) that minimizes the number of communication iterations between agents. Instead of communication, agents can rely on accumulated experience − a history of states and actions. If the observed state is similar to some previous state, then the agent takes the same action selected in this state, otherwise — communication is required. We propose a memory mechanism for similarity detection based on Mini-Batch K-Means clustering method. This approach is was applied to MAROH method. An experimental study of the solution quality and the number of communication iterations is presented.