Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion
Xianjun Gao, Jianchun Liu, Hongli Xu, Liusheng Huang

TL;DR
Orion is a novel reasoning framework for LLMs that decomposes queries into key points and expands content in parallel, significantly improving speed and reasoning quality for web applications.
Contribution
Orion introduces dependency-aware query decomposition and logic-parallel expansion, enabling efficient and high-quality reasoning in LLMs for real-time web services.
Findings
Up to 4.33x faster token generation
Up to 3.42x lower answer latency
18.75% improvement in reasoning quality
Abstract
The integration of Large Language Models (LLMs) into real-time Web applications, such as AI-powered search and conversational agents, presents a fundamental Web infrastructure challenge: reconciling the demand for high-quality, complex reasoning with the stringent low-latency and high-throughput requirements of interactive services. Current LLM reasoning, hindered by computationally inefficient sequential generation and rigid reasoning strategies, creates a critical bottleneck for the Web services. Existing approaches typically optimize the LLM reasoning for either efficiency or quality but struggle to achieve both, and thus fail to meet the dual requirements of modern Web platforms. To overcome these limitations, we propose Orion, a novel and efficient reasoning framework that enables dependency-aware query decomposition and logic-parallel content expansion. Concretely, Orion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
