# Learning Private Neural Language Modeling with Attentive Aggregation

**Authors:** Shaoxiong Ji, Shirui Pan, Guodong Long, Xue Li, Jing Jiang, Zi Huang

arXiv: 1812.07108 · 2020-09-22

## TL;DR

This paper introduces an attentive aggregation method for federated learning in private neural language modeling, improving model performance and communication efficiency by considering client contributions during model aggregation.

## Contribution

It proposes a novel attentive aggregation technique that accounts for client importance and includes an optimization process, enhancing global model generalization in federated learning.

## Key findings

- Outperforms existing methods in perplexity metrics.
- Reduces communication costs in federated learning.
- Effective on multiple language modeling datasets.

## Abstract

Mobile keyboard suggestion is typically regarded as a word-level language modeling problem. Centralized machine learning technique requires massive user data collected to train on, which may impose privacy concerns for sensitive personal typing data of users. Federated learning (FL) provides a promising approach to learning private language modeling for intelligent personalized keyboard suggestion by training models in distributed clients rather than training in a central server. To obtain a global model for prediction, existing FL algorithms simply average the client models and ignore the importance of each client during model aggregation. Furthermore, there is no optimization for learning a well-generalized global model on the central server. To solve these problems, we propose a novel model aggregation with the attention mechanism considering the contribution of clients models to the global model, together with an optimization technique during server aggregation. Our proposed attentive aggregation method minimizes the weighted distance between the server model and client models through iterative parameters updating while attends the distance between the server model and client models. Through experiments on two popular language modeling datasets and a social media dataset, our proposed method outperforms its counterparts in terms of perplexity and communication cost in most settings of comparison.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07108/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07108/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1812.07108/full.md

---
Source: https://tomesphere.com/paper/1812.07108