Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R
Pablo Robin Guerrero, Yueyang Pan, Sanidhya Kashyap

TL;DR
This paper evaluates the deployment of vision-language models on OnePlus 13R, identifying hardware bottlenecks and providing benchmarks and profiling tools to improve real-time mobile performance.
Contribution
It offers a comprehensive benchmarking and profiling analysis of VLM deployment frameworks on mobile devices, revealing key hardware utilization issues.
Findings
CPU over-utilization during token generation
GPU and NPU underutilization and saturation issues
Framework-level benchmarks and profiling tools provided
Abstract
Vision-Language Models (VLMs) offer promising capabilities for mobile devices, but their deployment faces significant challenges due to computational limitations and energy inefficiency, especially for real-time applications. This study provides a comprehensive survey of deployment frameworks for VLMs on mobile devices, evaluating llama.cpp, MLC-Imp, and mllm in the context of running LLaVA-1.5 7B, MobileVLM-3B, and Imp-v1.5 3B as representative workloads on a OnePlus 13R. Each deployment framework was evaluated on the OnePlus 13R while running VLMs, with measurements covering CPU, GPU, and NPU utilization, temperature, inference time, power consumption, and user experience. Benchmarking revealed critical performance bottlenecks across frameworks: CPU resources were consistently over-utilized during token generation, while GPU and NPU accelerators were largely unused. When the GPU was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Embedded Systems Design Techniques
