# affinity: A System for Latent User Similarity Comparison on Texting Data

**Authors:** Tobias Eichinger, Felix Beierle, Sumsam Ullah Khan, Robin, Middelanis, Veeraraghavan Sekar, Sam Tabibzadeh

arXiv: 1904.01897 · 2019-04-04

## TL;DR

Affinity is a privacy-preserving system that compares users' text messaging histories in a latent space, enabling reliable user similarity assessment without exposing private data, demonstrated on Twitter data with high accuracy.

## Contribution

The paper introduces affinity, a novel system for privacy-preserving user similarity comparison based on text messaging data in a latent space.

## Key findings

- Achieved 85% accuracy in political party classification based on similarity network.
- Demonstrated effective similarity assessment on Twitter histories of US senators.
- Ensured privacy by keeping raw data on devices and comparing in a latent format.

## Abstract

In the field of social networking services, finding similar users based on profile data is common practice. Smartphones harbor sensor and personal context data that can be used for user profiling. Yet, one vast source of personal data, that is text messaging data, has hardly been studied for user profiling. We see three reasons for this: First, private text messaging data is not shared due to their intimate character. Second, the definition of an appropriate privacy-preserving similarity measure is non-trivial. Third, assessing the quality of a similarity measure on text messaging data representing a potentially infinite set of topics is non-trivial. In order to overcome these obstacles we propose affinity, a system that assesses the similarity between text messaging histories of users reliably and efficiently in a privacy-preserving manner. Private texting data stays on user devices and data for comparison is compared in a latent format that neither allows to reconstruct the comparison words nor any original private plain text. We evaluate our approach by calculating similarities between Twitter histories of 60 US senators. The resulting similarity network reaches an average 85.0% accuracy on a political party classification task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.01897/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1904.01897/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1904.01897/full.md

---
Source: https://tomesphere.com/paper/1904.01897