StructuredRAG: JSON Response Formatting with Large Language Models

Connor Shorten; Charles Pierse; Thomas Benjamin Smith; Erika Cardenas,; Akanksha Sharma; John Trengrove; Bob van Luijt

arXiv:2408.11061·cs.CL·August 22, 2024·3 cites

StructuredRAG: JSON Response Formatting with Large Language Models

Connor Shorten, Charles Pierse, Thomas Benjamin Smith, Erika Cardenas,, Akanksha Sharma, John Trengrove, Bob van Luijt

PDF

Open Access

TL;DR

StructuredRAG introduces a benchmark to evaluate LLMs' ability to generate JSON responses, revealing high variability in performance influenced by task complexity, prompting further research into improving structured output reliability.

Contribution

This work presents StructuredRAG, a new benchmark with evaluation strategies, and provides insights into factors affecting LLMs' structured output generation performance.

Findings

01

Average success rate of 82.55% across tasks

02

High variance in performance from 0 to 100%

03

Task complexity impacts output accuracy

Abstract

The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON, is crucial for their use in Compound AI Systems. However, evaluating and improving this capability remains challenging. In this work, we introduce StructuredRAG, a benchmark of six tasks designed to assess LLMs' proficiency in following response format instructions. We evaluate two state-of-the-art LLMs, Gemini 1.5 Pro and Llama 3 8B-instruct with 4-bit quantization using two distinct prompting strategies. We introduce these prompting strategies as f-String and Follow the Format (FF) prompting. Across 24 experiments, we find an average success rate of 82.55%. We further find a high variance in performance across tasks, models, and prompting strategies with success rates ranging from 0 to 100%. We find that Llama 3 8B-instruct often performs competitively with Gemini 1.5 Pro. We observe that task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling