Open-domain Implicit Format Control for Large Language Model Generation
Yiqun Yao, Wenjia Ma, Xuezhi Fang, Xin Jiang, Xiang Li, Xuying Meng,, Peng Han, Jing Li, Aixin Sun, Yequan Wang

TL;DR
This paper introduces a new framework for controlling the output format of large language models using user-provided QA pairs, enabling open-domain format adherence without sacrificing output quality.
Contribution
It proposes a novel method leveraging one-shot QA pairs for open-domain format control and provides a dataset and benchmark for evaluation.
Findings
LLMs struggle with open-domain format control.
The proposed dataset improves format adherence.
Format control does not degrade output quality.
Abstract
Controlling the format of outputs generated by large language models (LLMs) is a critical functionality in various applications. Current methods typically employ constrained decoding with rule-based automata or fine-tuning with manually crafted format instructions, both of which struggle with open-domain format requirements. To address this limitation, we introduce a novel framework for controlled generation in LLMs, leveraging user-provided, one-shot QA pairs. This study investigates LLMs' capabilities to follow open-domain, one-shot constraints and replicate the format of the example answers. We observe that this is a non-trivial problem for current LLMs. We also develop a dataset collection methodology for supervised fine-tuning that enhances the open-domain format control of LLMs without degrading output quality, as well as a benchmark on which we evaluate both the helpfulness and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
