Analyzing LLM Instruction Optimization for Tabular Fact Verification
Xiaotang Du, Giwon Hong, Wai-Chung Kwan, Rohit Saxena, Ivan Titov, Pasquale Minervini, Emily Allaway

TL;DR
This paper systematically compares instruction optimization techniques for large language models in tabular fact verification, demonstrating consistent accuracy improvements and analyzing the effects of different optimizers and prompting methods.
Contribution
It introduces a comprehensive evaluation of instruction optimization methods for tabular fact verification using the DSPy framework, highlighting the effectiveness of specific optimizers and prompting strategies.
Findings
MiPROv2 provides stable gains for Chain-of-Thought prompting.
SIMBA yields the largest benefits for ReAct agents, especially at larger scales.
Instruction optimization consistently improves verification accuracy.
Abstract
Instruction optimization provides a lightweight, model-agnostic approach to enhancing the reasoning performance of large language models (LLMs). This paper presents the first systematic comparison of instruction optimization, based on the DSPy optimization framework, for tabular fact verification. We evaluate four out-of-the-box prompting techniques that cover both text-only prompting and code use: direct prediction, Chain-of-Thought (CoT), ReAct with SQL tools, and CodeAct with Python execution. We study three optimizers from the DSPy framework -- COPRO, MiPROv2, and SIMBA -- across four benchmarks and three model families. We find that instruction optimization consistently improves verification accuracy, with MiPROv2 yielding the most stable gains for CoT, and SIMBA providing the largest benefits for ReAct agents, particularly at larger model scales. Behavioral analyses reveal that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
