Learning Disentangled Textual Representations via Statistical Measures of Similarity
Pierre Colombo, Guillaume Staerman, Nathan Noiry, Pablo Piantanida

TL;DR
This paper introduces a new family of regularizers based on statistical similarity measures that enable disentangled textual representations without additional training, improving efficiency and effectiveness over existing methods.
Contribution
The authors propose a novel regularization approach for disentangling sensitive attributes from text representations that does not require training or hyperparameter tuning.
Findings
Regularizers are faster and do not require training.
Achieves better disentanglement results with pretrained and random encoders.
Reduces complexity compared to adversarial and mutual information methods.
Abstract
When working with textual data, a natural application of disentangled representations is fair classification where the goal is to make predictions without being biased (or influenced) by sensitive attributes that may be present in the data (e.g., age, gender or race). Dominant approaches to disentangle a sensitive attribute from textual representations rely on learning simultaneously a penalization term that involves either an adversarial loss (e.g., a discriminator) or an information measure (e.g., mutual information). However, these methods require the training of a deep neural network with several parameter updates for each update of the representation model. As a matter of fact, the resulting nested optimization loop is both time consuming, adding complexity to the optimization dynamic, and requires a fine hyperparameter selection (e.g., learning rates, architecture). In this work,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Model Reduction and Neural Networks · Hate Speech and Cyberbullying Detection
