SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation

Yufei Tian; Jiao Sun; Nanyun Peng; Zizhao Zhang

arXiv:2506.00319·cs.CL·June 3, 2025

SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation

Yufei Tian, Jiao Sun, Nanyun Peng, Zizhao Zhang

PDF

Open Access

TL;DR

SkillVerse is an unsupervised, tree-structured framework that evaluates and enhances large language models by diagnosing specific abilities and guiding improvements through hierarchical analysis.

Contribution

It introduces a novel tree-based diagnosis method for LLMs, enabling granular skill assessment and targeted performance enhancement.

Findings

01

Improves in-context learning performance by 25%.

02

Predicts new model weaknesses with 55% success rate.

03

Provides hierarchical insights into model capabilities.

Abstract

As language models evolve to tackle complex, multifaceted tasks, their evaluation must adapt to capture this intricacy. A granular, skill-specific understanding of model capabilities can empower researchers to make informed model development plans. In this paper, we introduce SkillVerse, an unsupervised tree-structured diagnosis framework for understanding model proficiency in specific abilities. With LLM as a judge, SkillVerse first critiques the model responses, and then organizes them into a hierarchical structure termed dendrogram. Given proficiency at arbitrary levels of granularity, SkillVerse is flexible to produce insights of behaviors of modern large models. We also demonstrate its efficacy in two downstream tasks: 1) improving model in-context learning by 25% using a tree-search algorithm to select more informative few-shot demonstrations, and 2) accurately predicting new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWikis in Education and Collaboration · Semantic Web and Ontologies · Digital Rights Management and Security