# Privacy-Preserving Classification with Secret Vector Machines

**Authors:** Valentin Hartmann, Konark Modi, Josep M. Pujol, Robert West

arXiv: 1907.03373 · 2020-08-21

## TL;DR

This paper introduces SecVM, a privacy-preserving method for training linear SVM classifiers in distributed settings with untrusted servers, ensuring user data privacy without sacrificing accuracy, and demonstrating its effectiveness in real-world applications.

## Contribution

The paper presents SecVM, a novel framework for training linear SVMs that maintains privacy in untrusted distributed environments, advancing federated learning with stronger privacy guarantees.

## Key findings

- SecVM preserves user privacy while maintaining classification accuracy.
- SecVM outperforms baseline methods in large-scale online evaluations.
- The framework is practical for deployment in production environments.

## Abstract

Today, large amounts of valuable data are distributed among millions of user-held devices, such as personal computers, phones, or Internet-of-things devices. Many companies collect such data with the goal of using it for training machine learning models allowing them to improve their services. User-held data is, however, often sensitive, and collecting it is problematic in terms of privacy. We address this issue by proposing a novel way of training a supervised classifier in a distributed setting akin to the recently proposed federated learning paradigm, but under the stricter privacy requirement that the server that trains the model is assumed to be untrusted and potentially malicious. We thus preserve user privacy by design, rather than by trust. In particular, our framework, called secret vector machine (SecVM), provides an algorithm for training linear support vector machines (SVM) in a setting in which data-holding clients communicate with an untrusted server by exchanging messages designed to not reveal any personally identifiable information. We evaluate our model in two ways. First, in an offline evaluation, we train SecVM to predict user gender from tweets, showing that we can preserve user privacy without sacrificing classification performance. Second, we implement SecVM's distributed framework for the Cliqz web browser and deploy it for predicting user gender in a large-scale online evaluation with thousands of clients, outperforming baselines by a large margin and thus showcasing that SecVM is suitable for production environments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.03373/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1907.03373/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1907.03373/full.md

---
Source: https://tomesphere.com/paper/1907.03373