Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R

Pablo Robin Guerrero; Yueyang Pan; Sanidhya Kashyap

arXiv:2507.08505·cs.LG·July 15, 2025

Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R

Pablo Robin Guerrero, Yueyang Pan, Sanidhya Kashyap

PDF

Open Access

TL;DR

This paper evaluates the deployment of vision-language models on OnePlus 13R, identifying hardware bottlenecks and providing benchmarks and profiling tools to improve real-time mobile performance.

Contribution

It offers a comprehensive benchmarking and profiling analysis of VLM deployment frameworks on mobile devices, revealing key hardware utilization issues.

Findings

01

CPU over-utilization during token generation

02

GPU and NPU underutilization and saturation issues

03

Framework-level benchmarks and profiling tools provided

Abstract

Vision-Language Models (VLMs) offer promising capabilities for mobile devices, but their deployment faces significant challenges due to computational limitations and energy inefficiency, especially for real-time applications. This study provides a comprehensive survey of deployment frameworks for VLMs on mobile devices, evaluating llama.cpp, MLC-Imp, and mllm in the context of running LLaVA-1.5 7B, MobileVLM-3B, and Imp-v1.5 3B as representative workloads on a OnePlus 13R. Each deployment framework was evaluated on the OnePlus 13R while running VLMs, with measurements covering CPU, GPU, and NPU utilization, temperature, inference time, power consumption, and user experience. Benchmarking revealed critical performance bottlenecks across frameworks: CPU resources were consistently over-utilized during token generation, while GPU and NPU accelerators were largely unused. When the GPU was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Embedded Systems Design Techniques