# Federated Learning over Wireless Fading Channels

**Authors:** Mohammad Mohammadi Amiri, Deniz Gunduz

arXiv: 1907.09769 · 2020-02-12

## TL;DR

This paper compares digital and analog distributed stochastic gradient descent schemes for federated learning over wireless fading channels, demonstrating the advantages of analog over-the-air methods in convergence speed and accuracy.

## Contribution

It introduces a novel compressed analog DSGD scheme that leverages the additive wireless channel for improved federated learning performance.

## Key findings

- CA-DSGD converges faster than D-DSGD and other schemes.
- CA-DSGD achieves higher accuracy, especially with non-i.i.d. data.
- Analog scheme is robust to imperfect channel state information.

## Abstract

We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS). We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and propose various techniques to implement distributed stochastic gradient descent (DSGD). We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error, and project the resultant sparse vector into a low-dimensional vector for bandwidth reduction. Numerical results show that D-DSGD outperforms other digital approaches in the literature; however, in general the proposed CA-DSGD algorithm converges faster than the D-DSGD scheme and other schemes in the literature, and reaches a higher level of accuracy. We have observed that the gap between the analog and digital schemes increases when the datasets of devices are not independent and identically distributed (i.i.d.). Furthermore, the performance of the CA-DSGD scheme is shown to be robust against imperfect channel state information (CSI) at the devices. Overall these results show clear advantages for the proposed analog over-the-air DSGD scheme, which suggests that learning and communication algorithms should be designed jointly to achieve the best end-to-end performance in machine learning applications at the wireless edge.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.09769/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1907.09769/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1907.09769/full.md

---
Source: https://tomesphere.com/paper/1907.09769