FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability
Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, Ran Xu,, Wenpeng Yin, Caiming Xiong

TL;DR
FoFo is a new benchmark designed to evaluate large language models' ability to accurately follow complex, domain-specific formats, addressing a critical gap in existing assessment tools for AI agents.
Contribution
The paper introduces FoFo, the first comprehensive benchmark for assessing LLMs' format-following capabilities across diverse real-world formats and domains.
Findings
Open-source models lag behind closed-source models in format adherence.
Format-following performance is independent of content quality.
Proficiency varies across different domains.
Abstract
This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents. Despite LLMs' advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and instructions, developed through an AI-Human collaborative method. Our evaluation across both open-source (e.g., Llama 2, WizardLM) and closed-source (e.g., GPT-4, PALM2, Gemini) LLMs highlights three key findings: open-source models significantly lag behind closed-source ones in format adherence; LLMs' format-following performance is independent of their content generation quality; and LLMs' format proficiency varies across different domains. These insights suggest the need for specialized tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsLibrary Science and Information Systems
MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Byte Pair Encoding · Multi-Head Attention · Softmax · Dense Connections · Label Smoothing · Adam
