A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation

Aditya Tiwari; Konduri Naga Lakshmi Rekha; Rajesh Kumar Mundotiya

arXiv:2605.18780·cs.IR·May 20, 2026

A Reproducibility Analysis of PO4ISR: Diagnosing and Mitigating Semantic Drift in LLM-Based Session Recommendation

Aditya Tiwari, Konduri Naga Lakshmi Rekha, Rajesh Kumar Mundotiya

PDF

TL;DR

This paper examines the reproducibility of PO4ISR's reasoning in session recommendation, identifies stability issues, and proposes PO4ISR++ to improve robustness across diverse semantic domains, significantly enhancing performance.

Contribution

The authors conduct a reproducibility study of PO4ISR, identify its limitations, and introduce PO4ISR++, a dynamic prompting method that improves stability and performance across multiple datasets.

Findings

01

Original PO4ISR struggles with semantic drift in new domains.

02

PO4ISR++ restores and enhances performance, with up to 54% gain on Games and 96% on Bundle.

03

Open-source artifacts are released for future research.

Abstract

Reasoning-based Large Language Models (LLMs) like PO4ISR have set new benchmarks in session-based recommendation. However, the reproducibility of their reasoning capabilities across diverse semantic domains remains unexplored. In this work, we conduct a rigorous reproducibility study of PO4ISR to assess its generalization limits. Our analysis reveals a critical failure mode: standard reasoning prompts suffer from severe contextual drift in long sessions, leading to performance degradation on semantically complex datasets like Games and Bundle. To quantify and resolve this stability gap, we introduce PO4ISR++, a robustness-enhanced implementation that integrates reflexive prompting and consistent rank detection. Unlike the original static prompting strategy, our approach dynamically adapts to cross-domain cues. We benchmark both the original implementation and our robust variant on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.