Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

Heng Zhou; Zelin Tan; Zhemeng Zhang; Yutao Fan; Yibing Lin; Li Kang; Xiufeng Song; Rui Li; Songtao Huang; Ao Yu; Yuchen Fan; Yanxu Chen; Kaixin Xu; Xiaohong Liu; Yiran Qin; Philip Torr; Chen Zhang; Zhenfei Yin

arXiv:2604.06753·cs.CL·April 9, 2026

Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

Heng Zhou, Zelin Tan, Zhemeng Zhang, Yutao Fan, Yibing Lin, Li Kang, Xiufeng Song, Rui Li, Songtao Huang, Ao Yu, Yuchen Fan, Yanxu Chen, Kaixin Xu, Xiaohong Liu, Yiran Qin, Philip Torr, Chen Zhang, Zhenfei Yin

PDF

TL;DR

This paper investigates inference-time reasoning paradigms for LLM agents, showing that a learned router for paradigm selection improves performance over fixed approaches across multiple models and tasks.

Contribution

It introduces a lightweight learned router that dynamically selects the best reasoning paradigm per task, outperforming fixed paradigms and zero-shot self-routing.

Findings

01

Reasoning structure benefits some tasks but harms others.

02

Oracle per-task selection outperforms fixed paradigms by 17.1pp.

03

Learned router improves average accuracy from 47.6% to 53.1%.

Abstract

When an LLM-based agent improves on a task, is the gain from the model itself or from the reasoning paradigm wrapped around it? We study this question by comparing six inference-time paradigms, namely Direct, CoT, ReAct, Plan-Execute, Reflection, and ReCode, across four frontier LLMs and ten benchmarks, yielding roughly 18,000 runs. We find that reasoning structure helps dramatically on some tasks but hurts on others: ReAct improves over Direct by 44pp on GAIA, while CoT degrades performance by 15pp on HumanEval. No single paradigm dominates, and oracle per-task selection beats the best fixed paradigm by 17.1pp on average. Motivated by this complementarity, we propose a select-then-solve approach: before answering each task, a lightweight embedding-based router selects the most suitable paradigm. Across four models, the router improves average accuracy from 47.6% to 53.1%, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.