Causal Direct Preference Optimization for Distributionally Robust Generative Recommendation
Chu Zhao, Enneng Yang, Jianzhe Zhao, Guibing Guo

TL;DR
This paper introduces CausalDPO, a causal extension of DPO, to improve the out-of-distribution robustness of LLM-based recommendation systems by mitigating environmental confounders through invariance learning.
Contribution
We propose CausalDPO, which incorporates causal invariance learning and backdoor adjustment to enhance the generalization of recommendation models across diverse environments.
Findings
CausalDPO outperforms DPO in OOD scenarios with a 17.17% average improvement.
Theoretical analysis confirms CausalDPO's ability to capture stable user preferences.
Extensive experiments validate the effectiveness of the proposed causal approach.
Abstract
Direct Preference Optimization (DPO) guides large language models (LLMs) to generate recommendations aligned with user historical behavior distributions by minimizing preference alignment loss. However, our systematic empirical research and theoretical analysis reveal that DPO tends to amplify spurious correlations caused by environmental confounders during the alignment process, significantly undermining the generalization capability of LLM-based generative recommendation methods in out of distribution (OOD) scenarios. To mitigate this issue, we propose CausalDPO, an extension of DPO that incorporates a causal invariance learning mechanism. This method introduces a backdoor adjustment strategy during the preference alignment phase to eliminate interference from environmental confounders, explicitly models the latent environmental distribution using a soft clustering approach, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Constraint Satisfaction and Optimization · Mobile Crowdsensing and Crowdsourcing
