Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows
Patara Trirat, Wonyong Jeong, Sung Ju Hwang

TL;DR
This paper introduces Agentic Predictor, a lightweight, multi-view encoding-based model that efficiently predicts the success of LLM-based agentic workflows, reducing costly evaluations and improving configuration selection.
Contribution
It presents a novel multi-view workflow encoding technique combined with cross-domain unsupervised pretraining for accurate performance prediction in LLM workflows.
Findings
Outperforms graph-based baselines in accuracy
Reduces number of evaluations needed for workflow optimization
Effective across multiple domains
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, but optimizing LLM-based agentic systems remains challenging due to the vast search space of agent configurations, prompting strategies, and communication patterns. Existing approaches often rely on heuristic-based tuning or exhaustive evaluation, which can be computationally expensive and suboptimal. This paper proposes Agentic Predictor, a lightweight predictor for efficient agentic workflow evaluation. Agentic Predictor is equipped with a multi-view workflow encoding technique that leverages multi-view representation learning of agentic systems by incorporating code architecture, textual prompts, and interaction graph features. To achieve high predictive accuracy while significantly reducing the number of required workflow evaluations for training a predictor, Agentic Predictor employs…
Peer Reviews
Decision·ICLR 2026 Poster
The paper addresses a practically important challenge—reducing the computational cost of evaluating agentic workflows—and clearly articulates why existing execution-based approaches are inefficient for workflow optimization. The evaluation is thorough, including ablation studies, low-label regime analysis, OOD generalization tests, and workflow optimization experiments, demonstrating the predictor's effectiveness across multiple dimensions. The method consistently outperforms strong GNN-based ba
1. The core contribution essentially combines existing techniques (multi-graph GNN, cross-view attention, contrastive pretraining) without fundamental innovation, the multi-view encoding is a relatively straightforward ensemble of three modality-specific encoders, and the pretraining strategy follows standard contrastive + reconstruction objectives commonly used in multi-modal learning. 2. All experiments rely on a single benchmark (FLORA-Bench) with its specific workflow representation format;
- The motivation (too many workflows, high cost to evaluate) is convincing. - The multi-view encoding idea is intuitive and makes sense: workflows are complex, so capturing various facets (graph, code, prompt) is a logical approach. - The pre-training across domains is a good practical touch: many real-world tasks have limited labels, so this should help generalization.
In Table 3, it is not clear how the improvement percentages are computed. When comparing Agentic Predictor to the best baseline, the gains appear much smaller. It seems that the authors calculate improvements relative to the simplest baseline (MLP), which could be misleading. Are the other rows in table 3 from the author? Is the MLP the previous state of the art baseline? Overall, the paper tends to overstate the magnitude of the improvements. The results are positive but more modest than implie
1. **Novel and well-motivated approach**: Decomposes agentic workflows into three complementary views (graph, code, prompt), balancing structural dependencies and semantic signals, fitting the "heterogeneous, label-scarce" problem nature. 2. **Effective pretraining strategy**: Cross-domain unsupervised pretraining significantly improves prediction quality in low-label scenarios, offering practical sample efficiency advantages. 3. **Efficiency and practicality**: The lightweight predictor serves
**1. Missing Multi-graph Definition** - The paper uses G_prompt, G_code, G_operator fused via CrossGraphAttn/ViewAttnPool, but does not clarify how the three graphs are constructed: whether they are isomorphic (sharing V, E), the source and representation of each graph's node/edge features (text embeddings, AST/CFG, operator types, etc.), how they are specifically generated from W={V,E,P,C}, and the boundary with FLORA-Bench "following". The absence of formal definitions/pseudocode/illustration
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Stream Mining Techniques · Business Process Modeling and Analysis
