Optimizing Small Language Models for In-Vehicle Function-Calling

Yahya Sowti Khiabani; Farris Atif; Chieh Hsu; Sven Stahlmann; Tobias; Michels; Sebastian Kramer; Benedikt Heidrich; M. Saquib Sarfraz; Julian; Merten; Faezeh Tafazzoli

arXiv:2501.02342·cs.LG·January 7, 2025·2 cites

Optimizing Small Language Models for In-Vehicle Function-Calling

Yahya Sowti Khiabani, Farris Atif, Chieh Hsu, Sven Stahlmann, Tobias, Michels, Sebastian Kramer, Benedikt Heidrich, M. Saquib Sarfraz, Julian, Merten, Faezeh Tafazzoli

PDF

Open Access

TL;DR

This paper presents a comprehensive method for deploying small language models in vehicles by applying model compression techniques, enabling real-time, on-device function calling that improves vehicle control and user experience.

Contribution

It introduces a holistic approach combining compression, fine-tuning, and integration for small language models in vehicles, maintaining performance within hardware constraints.

Findings

01

Model size reduced by up to 2 billion parameters.

02

Achieves 11 tokens per second inference speed.

03

Maintains task accuracy despite significant compression.

Abstract

We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices, offering a more flexible and robust alternative to traditional rule-based systems. By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience. Given the in-vehicle hardware constraints, we apply state-of-the-art model compression techniques, including structured pruning, healing, and quantization, ensuring that the model fits within the resource limitations while maintaining acceptable performance. Our work focuses on optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best practices for enabling embedded models, including compression, task-specific fine-tuning, and vehicle integration. We demonstrate that, despite significant reduction in model size which removes up to 2 billion parameters from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTransportation and Mobility Innovations · Electric and Hybrid Vehicle Technologies

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings