From Knots to Knobs: Towards Steerable Collaborative Filtering Using Sparse Autoencoders
Martin Spi\v{s}\'ak, Ladislav Pe\v{s}ka, Petr \v{S}koda, Vojt\v{e}ch Van\v{c}ura, Rodrigo Alves

TL;DR
This paper introduces a novel approach applying sparse autoencoders to collaborative filtering, enabling interpretable features and targeted steering of recommendations in large language models.
Contribution
It is the first to adapt sparse autoencoders for collaborative filtering, enhancing interpretability and controllability of recommendation systems.
Findings
SAEs produce largely monosemantic representations
Effective mapping functions link semantic concepts to neurons
Steering recommendations improves targeted outputs
Abstract
Sparse autoencoders (SAEs) have recently emerged as pivotal tools for introspection into large language models. SAEs can uncover high-quality, interpretable features at different levels of granularity and enable targeted steering of the generation process by selectively activating specific neurons in their latent activations. Our paper is the first to apply this approach to collaborative filtering, aiming to extract similarly interpretable features from representations learned purely from interaction signals. In particular, we focus on a widely adopted class of collaborative autoencoders (CFAEs) and augment them by inserting an SAE between their encoder and decoder networks. We demonstrate that such representation is largely monosemantic and propose suitable mapping functions between semantic concepts and individual neurons. We also evaluate a simple yet effective method that utilizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Topic Modeling
