VTechAGP: An Academic-to-General-Audience Text Paraphrase Dataset and Benchmark Models
Ming Cheng, Jiaying Gong, Chenhan Yuan, William A. Ingram, Edward Fox,, Hoda Eldardiry

TL;DR
This paper introduces VTechAGP, a novel dataset for academic-to-general text paraphrasing at the document level, and proposes DSPT5, a dynamic prompt-based generative model that outperforms large language models on this task.
Contribution
The paper provides the first academic-to-general paraphrase dataset and develops DSPT5, a novel dynamic soft prompt model with a contrastive-generative training approach.
Findings
DSPT5 achieves competitive results compared to larger models.
State-of-the-art LLMs underperform on this specific paraphrasing task.
The dataset enables benchmarking for academic to general audience text paraphrasing.
Abstract
Existing text simplification or paraphrase datasets mainly focus on sentence-level text generation in a general domain. These datasets are typically developed without using domain knowledge. In this paper, we release a novel dataset, VTechAGP, which is the first academic-to-general-audience text paraphrase dataset consisting of document-level these and dissertation academic and general-audience abstract pairs from 8 colleges authored over 25 years. We also propose a novel dynamic soft prompt generative language model, DSPT5. For training, we leverage a contrastive-generative loss function to learn the keyword vectors in the dynamic prompt. For inference, we adopt a crowd-sampling decoding strategy at both semantic and structural levels to further select the best output candidate. We evaluate DSPT5 and various state-of-the-art large language models (LLMs) from multiple perspectives.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Natural Language Processing Techniques
MethodsADaptive gradient method with the OPTimal convergence rate · Focus
