Decentralized Policy Gradient Method for Mean-Field Linear Quadratic Regulator with Global Convergence

Published in Workshop on Real World Experiment Design and Active Learning at ICML 2020, 2020

The scalability of multi-agent reinforcement learning methods to a large number of population is drawing more and more attention in both practice and theory. We consider the basic yet important model, i.e., linear quadratic regulator (LQR), in a mean-field approximation scheme against the curse of the action space dimensions and the exponential growth of agent interactions. Several methods proposed in the mean-field setting require a centralized controller, which is unrealistic in practice. In this paper, we present the first decentralized policy gradient method (MF-DPGM) for mean-field multi-agent reinforcement learning, where exchangeable agents of a large team communicate via a connected network. After a linear transformation of states and policies, we update the new local and mean-field policies by a decentralized gradient primal-dual algorithm respectively in a decoupled way, in order to achieve a global policy consensus. We also give a rigorous proof of the global convergence rate of MF-DPGM by studying the geometry of the problem and estimating one-step progress under a decentralized scheme. In addition, extensive experiments are conducted to support our theoretical findings. Download paper here