Shielded RecRL: Explanation Generation for Recommender Systems without Ranking Degradation

Ansh Tiwari; Ayush Chauhan

arXiv:2601.03608·cs.IR·January 8, 2026

Shielded RecRL: Explanation Generation for Recommender Systems without Ranking Degradation

Ansh Tiwari, Ayush Chauhan

PDF

Open Access

TL;DR

Shielded RecRL is a reinforcement learning method that generates personalized explanations for recommender systems without compromising their ranking performance, using a two-tower architecture and a composite reward signal.

Contribution

It introduces a novel RL approach with a two-tower architecture and gradient shielding to produce explanations without degrading ranking accuracy.

Findings

01

22.5% increase in click-through rate on Amazon Books dataset

02

Maintains recommender's item-ranking performance while improving explanations

03

Effective balance between explanation quality and ranking stability

Abstract

We introduce Shielded RecRL, a reinforcement learning approach to generate personalized explanations for recommender systems without sacrificing the system's original ranking performance. Unlike prior RLHF-based recommender methods that directly optimize item rankings, our two-tower architecture keeps the recommender's ranking model intact while a language model learns to produce helpful explanations. We design a composite reward signal combining explanation length, content relevance, and coherence, and apply proximal policy optimization (PPO) with a KL-divergence constraint to fine-tune a large language model with only 0.4% of its parameters trainable via LoRA adapters. In experiments on an Amazon Books dataset (approximately 50K interactions in the fantasy and romance genres), Shielded RecRL improved the relative click-through rate (CTR) by 22.5% (1.225x over baseline) while keeping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Recommender Systems and Techniques · Advanced Bandit Algorithms Research