ReFF: Reinforcing Format Faithfulness in Language Models across Varied Tasks
Jiashu Yao, Heyan Huang, Zeming Liu, Haoyu Wen, Wei Su, Boao Qian,, Yuhang Guo

TL;DR
This paper introduces FormatBench, a comprehensive benchmark for evaluating format faithfulness in large language models across diverse tasks, and proposes ReFF, a method to significantly improve LLMs' ability to generate correctly formatted outputs without sacrificing quality.
Contribution
The paper presents FormatBench, a new diverse benchmark for format faithfulness, and proposes ReFF, a novel reinforcement approach that enhances LLMs' formatting accuracy without needing annotated data.
Findings
ReFF improves format faithfulness from 21.6% to 95.0% without quality loss.
ReFF enhances both format accuracy and general quality when combined with labeled data.
State-of-the-art LLMs still struggle with format faithfulness across varied tasks.
Abstract
Following formatting instructions to generate well-structured content is a fundamental yet often unmet capability for large language models (LLMs). To study this capability, which we refer to as format faithfulness, we present FormatBench, a comprehensive format-related benchmark. Compared to previous format-related benchmarks, FormatBench involves a greater variety of tasks in terms of application scenes (traditional NLP tasks, creative works, autonomous agency tasks), human-LLM interaction styles (single-turn instruction, multi-turn chat), and format types (inclusion, wrapping, length, coding). Moreover, each task in FormatBench is attached with a format checker program. Extensive experiments on the benchmark reveal that state-of-the-art open- and closed-source LLMs still suffer from severe deficiency in format faithfulness. By virtue of the decidable nature of formats, we propose to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
