Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering
Anas Mohamed, Azal Ahmad Khan, Xinran Wang, Ahmad Faraz Khan, Shuwen Ge, Saman Bahzad Khan, Ayaan Ahmad, Ali Anwar

TL;DR
Sem-DPO enhances preference optimization in prompt engineering by maintaining semantic consistency, leading to improved image and language model outputs while providing theoretical bounds on semantic drift.
Contribution
Introduces Sem-DPO, a novel variant of DPO that incorporates semantic-aware weighting to reduce semantic drift in prompt optimization.
Findings
Sem-DPO achieves 8-12% higher CLIP similarity than DPO.
Sem-DPO attains 5-9% higher human-preference scores.
Sem-DPO outperforms state-of-the-art baselines on standard benchmarks.
Abstract
Generative AI can now synthesize strikingly realistic images from text, yet output quality remains highly sensitive to how prompts are phrased. Direct Preference Optimization (DPO) offers a lightweight, off-policy alternative to RL for automatic prompt engineering, but its token-level regularization leaves semantic inconsistency unchecked as prompts that win higher preference scores can still drift away from the user's intended meaning. We introduce Sem-DPO, a variant of DPO that preserves semantic consistency yet retains its simplicity and efficiency. Sem-DPO adjusts the DPO loss using a weight based on how different the winning prompt is from the original, reducing the impact of training examples that are semantically misaligned. We provide the first analytical bound on semantic drift for preference-tuned prompt generators, showing that Sem-DPO keeps learned prompts within a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
