Steering Language Generation: Harnessing Contrastive Expert Guidance and Negative Prompting for Coherent and Diverse Synthetic Data Generation
Charles O'Neill, Yuan-Sen Ting, Ioana Ciuca, Jack Miller, Thang Bui

TL;DR
STEER is a novel method for guiding large language models to generate synthetic data that is both coherent and diverse by using contrastive expert guidance and negative prompting at inference time.
Contribution
The paper introduces STEER, a new inference-time technique that balances semantic fidelity and diversity in synthetic data generation using logit reshaping.
Findings
STEER outperforms previous methods in balancing diversity and coherency.
It allows fine-tuned control over the diversity-coherency trade-off.
Demonstrates effectiveness across multiple tasks.
Abstract
Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility, which has numerous applications from downstream model training to practical data utilisation. However, contemporary models, despite their impressive capacities, consistently struggle to produce both coherent and diverse data. To address the coherency issue, we introduce contrastive expert guidance, where the difference between the logit distributions of fine-tuned and base language models is emphasised to ensure domain adherence. In order to ensure diversity, we utilise existing real and synthetic examples as negative prompts to the model. We deem this dual-pronged approach to logit reshaping as STEER: Semantic Text Enhancement via Embedding Repositioning. STEER operates at inference-time and systematically guides the LLMs to strike a balance between adherence to the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsBalanced Selection
