Hidden Signals in Language: Inferring Sensitive Attributes from Reddit Comments Using Machine Learning
Anay Agarwalla, Simeon Sayer

TL;DR
This study demonstrates that simple machine learning models can detect sensitive personal attributes from Reddit comments, raising privacy concerns about latent signals in user-generated text.
Contribution
It reveals that lightweight classifiers can infer sensitive attributes from language data, highlighting privacy risks and the need for transparency in language models.
Findings
Lightweight models can classify sensitive attributes with statistical significance.
Gender and age are more easily predicted than personality traits.
Predictive accuracy varies across different Reddit communities.
Abstract
Sensitive attributes are legally protected characteristics that should not be used to discriminate. Careful steps have been taken to minimize the risk of human bias regarding these fields, such as race and age. Large language models (LLMs) are similarly trained not to attempt to infer these aspects. However, just because they shouldn't, doesn't mean they don't. Using chat-like text fragments from authors tagged with sensitive attributes (e.g., MBTI personality, country of origin, gender), a model can often classify these attributes better than a naive guess, with results depending on the combination of subject matter and attribute. The text data from these comments is converted into numerical representations using embedding models, which are then used to train relatively simple classifiers such as logistic regression and decision trees. This study's results show that even these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
