# Learning Multiagent Communication with Backpropagation (NIPS 2016)

## Introduction

This paper delivers a neural network called CommNet to deal with the fully cooperative tasks by using continuous communication. According to the demonstration of the experiments, this network architecture shows the capability to enable agents to learn to communicate amongst themselves before taking actions.

In this network, each agent is controlled by a deep neural network, followed by the access to a communication channel carrying a **continuous vector**. Through this channel, they receive the summed transmissions of other agents. Thanks to the continuous communication, the model can be trained via **backpropagation** connected to the network controlled by the single agent.

This work can be applied to the wide range of problems including **partial visibility of the environment** and the **dynamic variation problem** that both of the number the type of agents can varies during the procedure.

## Settings

In this work, it assumes that all of agents fully cooperate to maximize a reward $\mathcal{R}$, independent of their contributions. In other words, this regards all of agents working as a whole group and pursue maximizing the same reward $\mathcal{R}$.

## Communication Model

In this work, the controller is a large feed forward neural network that maps states of all agents to their actions, where each agent occupying a subset of units.

Mathematically, the input to the controller is the concatenation $\textbf{s}=\{s_{1}, …, s_{J}\}$; the output of the controller is $\textbf{a}=\{a_{1}, …, a_{J}\}$; the mapping is $\textbf{a} = \Phi(\textbf{s})$.

The primary communication module $f^{i}$ can be described as follows. First, we need to define $i \in \{ 0, …, K \}$, where K is the total communication steps in the network. Each $f^{i}$ takes two input vectors for each agent $j$: the hidden state $h_{j}^{i}$ and the communication $c_{j}^{i}$, and outputs a vector $h_{j}^{i+1}$. The full formulas are shown as follows:

where $f^{i}$ is a single linear layer followed by a non-linearity $\sigma$ that $h_{j}^{i+1} = \sigma(H^{i}h_{j}^{i} + C^{i}c_{j}^{i})$.

If we concatenate the hidden layers of all of agents, we can rewrite Eq. (1) and (2) to $\textbf{h}^{i+1} = \sigma(T^{i}\textbf{h}^{i})$. In this equation, $\textbf{h}^{i} = [h_{j}^{i}]$,

where $\bar{C}^{i} = C^{i}/(J-1)$.

At the first layer of the model, $h_{j}^{0} = r(s_{j})$, where the state of each agent is encoded to a vector representation.

At the output of the communication model, a decoder function $q(h_{j}^{K})$ is used to map the hidden layer to the distribution over the action space. To obtain a discrete action, we sample $a_{j} \sim q(h_{j}^{K})$.

To sum up, the whole process of this communication model is:

- pass the states of all of agents through the encoder $\textbf{h}^{0} = r(\textbf{s})$
- calculate $\textbf{h}$ and $\textbf{c}$ iteratively following Eq. (1) and (2) to get $\textbf{h}^{K}$
- sample actions $\textbf{a}$ for all of agents according to $q(\textbf{h}^{K})$

The whole process of the communication model can be visualized as the figure below:

## Variants of Communication Model

**Local Connectivity**: In this variant, each agent just considers the local neighbours rather than all of other agents. So the computation of the communication is as follows:
where $N(j)$ represents the set of neighbourhoods at the current time.

**Skip Connection**: In this variant, the representation of state $h_{j}^{0}$ works as another input variable of $f^{i}$ rather than the time step, which is shown as:

**Temporal Recurrence**: In this variant, the step $i$ in Eq. (1) and (2) can be replaced by $t$, so that the whole process of communication becomes a recurrent neural network (RNN). Following this intuition, the mapping $f^{t}$ can be replaced by other RNN unit e.g. LSTM, GRU.