JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks

Lanbo Lin; Jiayao Liu; Tianyuan Yang; Li Cai; Yuanwu Xu; Lei Wei; Sicong Xie; Guannan Zhang

arXiv:2602.06486·cs.AI·February 9, 2026

JADE: Expert-Grounded Dynamic Evaluation for Open-Ended Professional Tasks

Lanbo Lin, Jiayao Liu, Tianyuan Yang, Li Cai, Yuanwu Xu, Lei Wei, Sicong Xie, Guannan Zhang

PDF

Open Access

TL;DR

JADE is a two-layer evaluation framework inspired by human experts that improves the assessment of open-ended professional tasks by balancing stability and flexibility, leading to better detection of agent failures and alignment with expert standards.

Contribution

JADE introduces a novel two-layer evaluation method combining stable skill-based criteria with dynamic, claim-level assessment for open-ended tasks.

Findings

01

Improves evaluation stability over LLM-only methods

02

Reveals critical failure modes missed by holistic evaluators

03

Successfully transfers to medical domain benchmarks

Abstract

Evaluating agentic AI on open-ended professional tasks faces a fundamental dilemma between rigor and flexibility. Static rubrics provide rigorous, reproducible assessment but fail to accommodate diverse valid response strategies, while LLM-as-a-judge approaches adapt to individual responses yet suffer from instability and bias. Human experts address this dilemma by combining domain-grounded principles with dynamic, claim-level assessment. Inspired by this process, we propose JADE, a two-layer evaluation framework. Layer 1 encodes expert knowledge as a predefined set of evaluation skills, providing stable evaluation criteria. Layer 2 performs report-specific, claim-level evaluation to flexibly assess diverse reasoning strategies, with evidence-dependency gating to invalidate conclusions built on refuted claims. Experiments on BizBench show that JADE improves evaluation stability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Topic Modeling