From Insight to Intervention: Interpretable Neuron Steering for Controlling Popularity Bias in Recommender Systems
Parviz Ahmadov, Masoud Mansoury

TL;DR
This paper introduces PopSteer, a post-hoc method using a Sparse Autoencoder to interpret and mitigate popularity bias in recommender systems, improving fairness with minimal accuracy loss.
Contribution
It presents a novel, interpretable neuron-level steering approach for popularity bias mitigation in recommender systems, enhancing transparency and control.
Findings
Significantly improves fairness in recommendations.
Maintains recommendation accuracy with minimal impact.
Provides interpretable insights into bias mechanisms.
Abstract
Popularity bias is a pervasive challenge in recommender systems, where a few popular items dominate attention while the majority of less popular items remain underexposed. This imbalance can reduce recommendation quality and lead to unfair item exposure. Although existing mitigation methods address this issue to some extent, they often lack transparency in how they operate. In this paper, we propose a post-hoc approach, PopSteer, that leverages a Sparse Autoencoder (SAE) to both interpret and mitigate popularity bias in recommendation models. The SAE is trained to replicate a trained model's behavior while enabling neuron-level interpretability. By introducing synthetic users with strong preferences for either popular or unpopular items, we identify neurons encoding popularity signals through their activation patterns. We then steer recommendations by adjusting the activations of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Sentiment Analysis and Opinion Mining
