SimpleTool: Parallel Decoding for Real-Time LLM Function Calling
Xiaoxin Shi, Jiaxin Wan, Linkang Dong, Wei Jiang, Yue Liu, Zengfeng Huang

TL;DR
SimpleTool introduces a parallel decoding method for LLM function calling that significantly reduces latency, enabling real-time applications by exploiting token redundancy and weak dependencies.
Contribution
It proposes a novel parallel decoding approach using special tokens to compress tokens and enable independent generation, achieving substantial speedups in LLM function calling.
Findings
Achieves 3-6x end-to-end speedup with minimal overhead.
Maintains or improves accuracy across benchmarks.
Enables 16 Hz real-time control on consumer GPUs.
Abstract
LLM-based function calling enables intelligent agents to interact with external tools and environments, yet autoregressive decoding imposes a fundamental latency bottleneck that limits real-time applications such as embodied intelligence, game AI, and interactive avatars (e.g., 10 Hz control frequency). We observe that function calling differs fundamentally from free-form text generation: structured outputs exhibit substantial token redundancy (delimiters, parameter names), and arguments exhibit weak causal dependencies. Crucially, these two properties must be exploited jointly to achieve real-time performance. We present SimpleTool, which introduces special tokens that serve a dual role: compressing low-entropy tokens (4-6x reduction) while acting as mode selectors that enable independent parallel generation of function name and arguments. This synergistic design achieves 3-6x…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Artificial Intelligence in Games · Parallel Computing and Optimization Techniques
