ReFF: Reinforcing Format Faithfulness in Language Models across Varied   Tasks

Jiashu Yao; Heyan Huang; Zeming Liu; Haoyu Wen; Wei Su; Boao Qian,; Yuhang Guo

arXiv:2412.09173·cs.CL·December 13, 2024

ReFF: Reinforcing Format Faithfulness in Language Models across Varied Tasks

Jiashu Yao, Heyan Huang, Zeming Liu, Haoyu Wen, Wei Su, Boao Qian,, Yuhang Guo

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces FormatBench, a comprehensive benchmark for evaluating format faithfulness in large language models across diverse tasks, and proposes ReFF, a method to significantly improve LLMs' ability to generate correctly formatted outputs without sacrificing quality.

Contribution

The paper presents FormatBench, a new diverse benchmark for format faithfulness, and proposes ReFF, a novel reinforcement approach that enhances LLMs' formatting accuracy without needing annotated data.

Findings

01

ReFF improves format faithfulness from 21.6% to 95.0% without quality loss.

02

ReFF enhances both format accuracy and general quality when combined with labeled data.

03

State-of-the-art LLMs still struggle with format faithfulness across varied tasks.

Abstract

Following formatting instructions to generate well-structured content is a fundamental yet often unmet capability for large language models (LLMs). To study this capability, which we refer to as format faithfulness, we present FormatBench, a comprehensive format-related benchmark. Compared to previous format-related benchmarks, FormatBench involves a greater variety of tasks in terms of application scenes (traditional NLP tasks, creative works, autonomous agency tasks), human-LLM interaction styles (single-turn instruction, multi-turn chat), and format types (inclusion, wrapping, length, coding). Moreover, each task in FormatBench is attached with a format checker program. Extensive experiments on the benchmark reveal that state-of-the-art open- and closed-source LLMs still suffer from severe deficiency in format faithfulness. By virtue of the decidable nature of formats, we propose to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BITHLP/ReFF
pytorchOfficial

Videos

ReFF: Reinforcing Format Faithfulness in Language Models across Varied Tasks· underline

Taxonomy

TopicsTopic Modeling