TL;DR
IF-CRITIC introduces a fine-grained, efficient LLM critic that improves instruction-following evaluation accuracy and reliability, enabling better model training with lower computational costs.
Contribution
The paper presents a novel LLM critic with a checklist-based approach and a multi-stage filtering mechanism, outperforming existing evaluation models.
Findings
IF-CRITIC surpasses strong LLM-as-a-Judge baselines in evaluation performance.
Using IF-CRITIC's reward signals enhances LLM instruction-following performance.
The approach reduces computational overhead compared to other LLM critic baselines.
Abstract
Instruction-following is a fundamental ability of Large Language Models (LLMs), requiring their generated outputs to follow multiple constraints imposed in input instructions. Numerous studies have attempted to enhance this ability through preference optimization or reinforcement learning based on reward signals from LLM-as-a-Judge. However, existing evaluation models for instruction-following still possess many deficiencies, such as substantial costs and unreliable assessments. To this end, we propose IF-CRITIC, an LLM critic for fine-grained, efficient, and reliable instruction-following evaluation. We first develop a checklist generator to decompose instructions and generate constraint checklists. With the assistance of the checklists, we collect high-quality critique training data through a multi-stage critique filtering mechanism and employ a constraint-level preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
