Evidence Over Plans: Online Trajectory Verification for Skill Distillation

Yang Zhou; Zihan Dong; Zhenting Wang; Can Jin; Shiyu Zhao; Bangwei Guo; Difei Gu; Linjun Zhang; Mu Zhou; Dimitris N. Metaxas

arXiv:2605.09192·cs.AI·May 12, 2026

Evidence Over Plans: Online Trajectory Verification for Skill Distillation

Yang Zhou, Zihan Dong, Zhenting Wang, Can Jin, Shiyu Zhao, Bangwei Guo, Difei Gu, Linjun Zhang, Mu Zhou, Dimitris N. Metaxas

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces SPARK, a method for online trajectory verification using the Posterior Distillation Index (PDI) to improve skill distillation grounded in environment interaction, outperforming prior approaches.

Contribution

The paper proposes PDI and SPARK for environment-grounded skill distillation, enabling online diagnostics and interventions to produce more effective skills.

Findings

01

SPARK-generated skills outperform no-skill baselines.

02

Skills surpass human-written skills on student models.

03

PDI-guided distillation yields efficient, transferable skills.

Abstract

Agent skills can remarkably improve task success rates by using human-written procedural documents, but their quality is difficult to assess without environment-grounded verification. Existing skill generation methods heavily rely on preference logs rather than direct environment interaction, often yielding negligible or even degraded gains. We identify that it is a fundamental timing bottleneck: robust skills should be posterior-based, distilled from empirical environment interaction rather than prior plans. In this study, we introduce the Posterior Distillation Index (PDI), a trajectory-level metric that quantifies how well a distilled skill is grounded in the task-environment evidence. To operationalize PDI, we present SPARK (Structured Pipelines for Autonomous Runnable tasKs and sKill generation) for preserving task execution evidence towards full trajectory-level analysis. SPARK…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EtaYang10th/spark-skills
github

Datasets

EtaYang10th/SPARK_PDI_Trajectory
dataset· 288 dl
288 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.