Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data
Christopher Lee L\"ubbers

TL;DR
This paper demonstrates that using human-ranked data and Direct Preference Optimization (DPO) significantly improves paraphrase type generation accuracy and human preference alignment, advancing the reliability of semantic paraphrasing models.
Contribution
It introduces a DPO-based training approach with human-annotated data to enhance paraphrase-type generation accuracy and human preference alignment.
Findings
DPO training increases paraphrase accuracy by 3 percentage points.
Human preference ratings improve by 7 percentage points.
Paraphrase-type detection achieves high F1 scores, e.g., 0.91 for addition/deletion.
Abstract
Paraphrasing re-expresses meaning to enhance applications like text simplification, machine translation, and question-answering. Specific paraphrase types facilitate accurate semantic analysis and robust language models. However, existing paraphrase-type generation methods often misalign with human preferences due to reliance on automated metrics and limited human-annotated training data, obscuring crucial aspects of semantic fidelity and linguistic transformations. This study addresses this gap by leveraging a human-ranked paraphrase-type dataset and integrating Direct Preference Optimization (DPO) to align model outputs directly with human judgments. DPO-based training increases paraphrase-type generation accuracy by 3 percentage points over a supervised baseline and raises human preference ratings by 7 percentage points. A newly created human-annotated dataset supports more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗cluebbers/Llama-3.1-8B-paraphrase-type-generation-etpcmodel· 7 dl7 dl
- 🤗cluebbers/Llama-3.1-8B-paraphrase-type-generation-etpc-apty-rewardmodel· 1 dl1 dl
- 🤗cluebbers/Llama-3.1-8B-paraphrase-type-generation-apty-ipomodel· 6 dl6 dl
- 🤗cluebbers/Llama-3.1-8B-paraphrase-type-generation-apty-sigmoidmodel· 6 dl6 dl
- 🤗cluebbers/deberta-base-paraphrase-detection-qqpmodel· 6 dl6 dl
- 🤗cluebbers/deberta-base-paraphrase-type-detection-etpcmodel· 2 dl2 dl
- 🤗cluebbers/bart-large-paraphrase-type-generation-etpcmodel· 4 dl4 dl
- 🤗cluebbers/bart-large-paraphrase-type-generation-apty-sigmoidmodel· 3 dl3 dl
- 🤗cluebbers/bart-large-paraphrase-type-generation-apty-ipomodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Topic Modeling
MethodsALIGN · Shrink and Fine-Tune · Direct Preference Optimization
