Optimizing Small Language Models for In-Vehicle Function-Calling
Yahya Sowti Khiabani, Farris Atif, Chieh Hsu, Sven Stahlmann, Tobias, Michels, Sebastian Kramer, Benedikt Heidrich, M. Saquib Sarfraz, Julian, Merten, Faezeh Tafazzoli

TL;DR
This paper presents a comprehensive method for deploying small language models in vehicles by applying model compression techniques, enabling real-time, on-device function calling that improves vehicle control and user experience.
Contribution
It introduces a holistic approach combining compression, fine-tuning, and integration for small language models in vehicles, maintaining performance within hardware constraints.
Findings
Model size reduced by up to 2 billion parameters.
Achieves 11 tokens per second inference speed.
Maintains task accuracy despite significant compression.
Abstract
We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices, offering a more flexible and robust alternative to traditional rule-based systems. By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience. Given the in-vehicle hardware constraints, we apply state-of-the-art model compression techniques, including structured pruning, healing, and quantization, ensuring that the model fits within the resource limitations while maintaining acceptable performance. Our work focuses on optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best practices for enabling embedded models, including compression, task-specific fine-tuning, and vehicle integration. We demonstrate that, despite significant reduction in model size which removes up to 2 billion parameters from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation and Mobility Innovations · Electric and Hybrid Vehicle Technologies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
