Diagnosing Capability Gaps in Fine-Tuning Data

Saeid Asgari Taghanaki; Rakshanda Agarwal; Bruce Sun; Rohan Jha; Elias Stengel-Eskin; Sara Malvar; Rui Ying; Yifei Xu; Guilherme Potje; Tusher Chakraborty; Leonardo de Oliveira Nunes; Ranveer Chandra; Emre Kiciman

arXiv:2604.27547·cs.LG·May 1, 2026

Diagnosing Capability Gaps in Fine-Tuning Data

Saeid Asgari Taghanaki, Rakshanda Agarwal, Bruce Sun, Rohan Jha, Elias Stengel-Eskin, Sara Malvar, Rui Ying, Yifei Xu, Guilherme Potje, Tusher Chakraborty, Leonardo de Oliveira Nunes, Ranveer Chandra, Emre Kiciman

PDF

TL;DR

GoalCover is a framework that systematically identifies capability gaps in fine-tuning datasets for large language models, improving targeted training and downstream performance.

Contribution

It introduces an interactive goal decomposition and automated coverage assessment method to detect missing capabilities before fine-tuning.

Findings

01

GoalCover reliably distinguishes targeted capability impacts with 25.6% degradation.

02

Filtering data with GoalCover improves LLM-judge reward from 3.77 to 4.12.

03

Combining filtered data with synthetic samples yields the highest reward of 4.20.

Abstract

Fine-tuning large language models (LLMs) for domain-specific tasks requires training datasets that comprehensively cover the target capabilities a practitioner needs. Yet identifying which capabilities a dataset fails to support, and doing so before an expensive fine-tuning run, remains a largely unsolved problem. We introduce GoalCover, a framework that helps practitioners systematically detect capability gaps in fine-tuning datasets through interactive goal decomposition and automated coverage assessment. GoalCover guides a practitioner through structured decomposition of a high-level goal into atomic, independently evaluable subgoals; assigns each training sample an LLM-based alignment score against every subgoal; and surfaces missing capabilities through automated analysis of low-scoring sample explanations. We validate the framework along two complementary axes. First, through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.