TimelyLLM: Segmented LLM Serving System for Time-sensitive Robotic Applications
Neiwen Ling, Guojun Chen, Lin Zhong

TL;DR
TimelyLLM is a new LLM serving system designed for time-sensitive robotic applications, introducing segmented generation and scheduling to significantly reduce waiting times and improve response efficiency.
Contribution
The paper presents TimelyLLM, a novel LLM serving system with segmented generation and scheduling mechanisms tailored for real-time robotic tasks, addressing limitations of existing FCFS approaches.
Findings
Up to 1.97x improvement in time utility
84% reduction in overall waiting time
Effective handling of time-sensitive robotic requests
Abstract
Large Language Models (LLMs) such as GPT-4 and Llama3 can already comprehend complex commands and process diverse tasks. This advancement facilitates their application in controlling drones and robots for various tasks. However, existing LLM serving systems typically employ a first-come, first-served (FCFS) batching mechanism, which fails to address the time-sensitive requirements of robotic applications. To address it, this paper proposes a new system named TimelyLLM serving multiple robotic agents with time-sensitive requests. TimelyLLM introduces novel mechanisms of segmented generation and scheduling that optimally leverage redundancy between robot plan generation and execution phases. We report an implementation of TimelyLLM on a widely-used LLM serving framework and evaluate it on a range of robotic applications. Our evaluation shows that TimelyLLM improves the time utility up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Automation and Control Systems · Sensor Technology and Measurement Systems · Network Time Synchronization Technologies
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Dropout · Residual Connection · Multi-Head Attention · Adam
