Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Patara Trirat; Wonyong Jeong; Sung Ju Hwang

arXiv:2505.19764·cs.LG·March 2, 2026

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Agentic Predictor, a lightweight, multi-view encoding-based model that efficiently predicts the success of LLM-based agentic workflows, reducing costly evaluations and improving configuration selection.

Contribution

It presents a novel multi-view workflow encoding technique combined with cross-domain unsupervised pretraining for accurate performance prediction in LLM workflows.

Findings

01

Outperforms graph-based baselines in accuracy

02

Reduces number of evaluations needed for workflow optimization

03

Effective across multiple domains

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and communication patterns. Existing approaches often rely on heuristic-based tuning or exhaustive evaluation, which can be computationally expensive and suboptimal. This paper proposes Agentic Predictor, a lightweight predictor for efficient agentic workflow evaluation. Agentic Predictor is equipped with a multi-view workflow encoding technique that leverages multi-view representation learning of agentic systems by incorporating code architecture, textual prompts, and interaction graph features. To achieve high predictive accuracy while significantly reducing the number of required workflow evaluations for training a predictor, Agentic Predictor employs…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

The paper addresses a practically important challenge—reducing the computational cost of evaluating agentic workflows—and clearly articulates why existing execution-based approaches are inefficient for workflow optimization. The evaluation is thorough, including ablation studies, low-label regime analysis, OOD generalization tests, and workflow optimization experiments, demonstrating the predictor's effectiveness across multiple dimensions. The method consistently outperforms strong GNN-based ba

Weaknesses

1. The core contribution essentially combines existing techniques (multi-graph GNN, cross-view attention, contrastive pretraining) without fundamental innovation, the multi-view encoding is a relatively straightforward ensemble of three modality-specific encoders, and the pretraining strategy follows standard contrastive + reconstruction objectives commonly used in multi-modal learning. 2. All experiments rely on a single benchmark (FLORA-Bench) with its specific workflow representation format;

Reviewer 02Rating 4Confidence 2

Strengths

- The motivation (too many workflows, high cost to evaluate) is convincing. - The multi-view encoding idea is intuitive and makes sense: workflows are complex, so capturing various facets (graph, code, prompt) is a logical approach. - The pre-training across domains is a good practical touch: many real-world tasks have limited labels, so this should help generalization.

Weaknesses

In Table 3, it is not clear how the improvement percentages are computed. When comparing Agentic Predictor to the best baseline, the gains appear much smaller. It seems that the authors calculate improvements relative to the simplest baseline (MLP), which could be misleading. Are the other rows in table 3 from the author? Is the MLP the previous state of the art baseline? Overall, the paper tends to overstate the magnitude of the improvements. The results are positive but more modest than implie

Reviewer 03Rating 6Confidence 3

Strengths

1. **Novel and well-motivated approach**: Decomposes agentic workflows into three complementary views (graph, code, prompt), balancing structural dependencies and semantic signals, fitting the "heterogeneous, label-scarce" problem nature. 2. **Effective pretraining strategy**: Cross-domain unsupervised pretraining significantly improves prediction quality in low-label scenarios, offering practical sample efficiency advantages. 3. **Efficiency and practicality**: The lightweight predictor serves

Weaknesses

**1. Missing Multi-graph Definition** - The paper uses G_prompt, G_code, G_operator fused via CrossGraphAttn/ViewAttnPool, but does not clarify how the three graphs are constructed: whether they are isomorphic (sharing V, E), the source and representation of each graph's node/edge features (text embeddings, AST/CFG, operator types, etc.), how they are specifically generated from W={V,E,P,C}, and the boundary with FLORA-Bench "following". The absence of formal definitions/pseudocode/illustration

Code & Models

Repositories

deepauto-ai/agentic-predictor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Data Stream Mining Techniques · Business Process Modeling and Analysis