TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices
Mohd Ariful Haque (1), Fahad Rahman (2), Kishor Datta Gupta (1), Khalil Shujaee (1), Roy George (1) ((1) Clark Atlanta University, (2) United International University)

TL;DR
This paper evaluates and optimizes small language models for agentic tasks on edge devices, demonstrating that hybrid optimization strategies significantly improve their accuracy and stability for autonomous, privacy-preserving AI applications.
Contribution
It introduces a comprehensive evaluation framework and hybrid optimization methods for small language models, enabling effective agentic tasks on edge devices.
Findings
Medium-sized models outperform ultra-compact models in accuracy.
Hybrid optimization achieves up to 65.74% overall accuracy.
Small models can be effective for autonomous AI on edge devices.
Abstract
This paper investigates the effectiveness of small language models (SLMs) for agentic tasks (function/tool/API calling) with a focus on running agents on edge devices without reliance on cloud infrastructure. We evaluate SLMs using the Berkeley Function Calling Leaderboard (BFCL) framework and describe parameter-driven optimization strategies that include supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), reinforcement learning (RL)-based optimization, preference alignment via Direct Preference Optimization (DPO), and hybrid methods. We report results for models including TinyAgent, TinyLlama, Qwen, and xLAM across BFCL categories (simple, multiple, parallel, parallel-multiple, and relevance detection), both in live and non-live settings, and in multi-turn evaluations. We additionally detail a DPO training pipeline constructed from AgentBank data (e.g., ALFRED),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning and Data Classification · Big Data and Digital Economy
