Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management
Yanan Wang, Yong Ge

TL;DR
This paper introduces a novel preference finetuning method for large language models to generate domain-specific online review responses, addressing hallucinations and aligning with human preferences.
Contribution
The authors develop a new preference finetuning approach with context augmentation, theory-driven preference pair construction, curriculum learning, and a density estimation support constraint.
Findings
The proposed method effectively reduces hallucinations in review responses.
It better aligns generated responses with human preferences.
The approach outperforms existing offline preference finetuning methods.
Abstract
Online reviews have played a pivotal role in consumers' decision-making processes. Existing research has highlighted the significant impact of managerial review responses on customer relationship management and firm performance. However, a large portion of online reviews remains unaddressed due to the considerable human labor required to respond to the rapid growth of online reviews. While generative AI has achieved remarkable success in a range of tasks, they are general-purpose models and may not align well with domain-specific human preferences. To tailor these general generative AI models to domain-specific applications, finetuning is commonly employed. Nevertheless, several challenges persist in finetuning with domain-specific data, including hallucinations, difficulty in representing domain-specific human preferences, and over conservatism in offline policy optimization. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
