Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile
Samuel Carreira, Tom\'as Marques, Jos\'e Ribeiro, Carlos Grilo

TL;DR
This paper demonstrates a novel approach to running a 3-billion-parameter GPT language model directly on mobile devices, enabling private, low-latency AI interactions without network dependency.
Contribution
It introduces a fine-tuned, quantized GPT model capable of running efficiently on low-memory mobile devices, advancing on-device AI capabilities.
Findings
Successful deployment of a 3B parameter GPT on mobile devices
Achieved smooth operation with as low as 4GB memory
Enhanced privacy and reduced latency in mobile AI interactions
Abstract
The field of Artificial Intelligence has witnessed remarkable progress in recent years, especially with the emergence of powerful large language models (LLMs) based on the transformer architecture. Cloud-based LLMs, such as OpenAI's ChatGPT, offer impressive capabilities but come with concerns regarding latency and privacy due to network dependencies. This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity. The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory. Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Data Quality and Management
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Residual Connection · Linear Warmup With Cosine Annealing · Layer Normalization
