A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation
Aditya Tiwari, Konduri Naga Lakshmi Rekha, Rajesh Kumar Mundotiya

TL;DR
This paper examines the reproducibility of PO4ISR's reasoning in session recommendation, identifies stability issues, and proposes PO4ISR++ to improve robustness across diverse semantic domains, significantly enhancing performance.
Contribution
The authors conduct a reproducibility study of PO4ISR, identify its limitations, and introduce PO4ISR++, a dynamic prompting method that improves stability and performance across multiple datasets.
Findings
Original PO4ISR struggles with semantic drift in new domains.
PO4ISR++ restores and enhances performance, with up to 54% gain on Games and 96% on Bundle.
Open-source artifacts are released for future research.
Abstract
Reasoning-based Large Language Models (LLMs) like PO4ISR have set new benchmarks in session-based recommendation. However, the reproducibility of their reasoning capabilities across diverse semantic domains remains unexplored. In this work, we conduct a rigorous reproducibility study of PO4ISR to assess its generalization limits. Our analysis reveals a critical failure mode: standard reasoning prompts suffer from severe contextual drift in long sessions, leading to performance degradation on semantically complex datasets like Games and Bundle. To quantify and resolve this stability gap, we introduce PO4ISR++, a robustness-enhanced implementation that integrates reflexive prompting and consistent rank detection. Unlike the original static prompting strategy, our approach dynamically adapts to cross-domain cues. We benchmark both the original implementation and our robust variant on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
