STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction
Xiaoxiao Wang, Chunxiao Li, Junying Wang, Yijin Guo, Zijian Chen, Chunyi Li, Xiaohong Liu, Zicheng Zhang, Guangtao Zhai

TL;DR
STAR is a novel framework that combines statistical expectations with agentic reasoning to accurately predict large model performance from limited data, improving reliability and interpretability.
Contribution
It introduces a hybrid approach that integrates knowledge retrieval, probabilistic modeling, and reasoning guided by EVT for better performance prediction under data sparsity.
Findings
Achieves 14.46% improvement over statistical methods in sparse settings.
Outperforms all baselines on score-based and rank-based metrics.
Effective with only 1-2 observed scores per model.
Abstract
As comprehensive large model evaluation becomes prohibitively expensive, predicting model performance from limited observations has become essential. However, existing statistical methods struggle with pattern shifts, data sparsity, and lack of explanation, while pure LLM methods remain unreliable. We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning. STAR leverages specialized retrievers to gather external knowledge and embeds semantic features into Constrained Probabilistic Matrix Factorization (CPMF) to generate statistical expectations with uncertainty. A reasoning module guided by Expectation Violation Theory (EVT) then refines predictions through intra-family analysis, cross-model comparison, and credibility-aware aggregation, producing adjustments with traceable explanations. Extensive experiments show that STAR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Multimodal Machine Learning Applications
