Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT   LLM on Mobile

Samuel Carreira; Tom\'as Marques; Jos\'e Ribeiro; Carlos Grilo

arXiv:2310.01434·cs.CL·October 4, 2023·2 cites

Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile

Samuel Carreira, Tom\'as Marques, Jos\'e Ribeiro, Carlos Grilo

PDF

Open Access

TL;DR

This paper demonstrates a novel approach to running a 3-billion-parameter GPT language model directly on mobile devices, enabling private, low-latency AI interactions without network dependency.

Contribution

It introduces a fine-tuned, quantized GPT model capable of running efficiently on low-memory mobile devices, advancing on-device AI capabilities.

Findings

01

Successful deployment of a 3B parameter GPT on mobile devices

02

Achieved smooth operation with as low as 4GB memory

03

Enhanced privacy and reduced latency in mobile AI interactions

Abstract

The field of Artificial Intelligence has witnessed remarkable progress in recent years, especially with the emergence of powerful large language models (LLMs) based on the transformer architecture. Cloud-based LLMs, such as OpenAI's ChatGPT, offer impressive capabilities but come with concerns regarding latency and privacy due to network dependencies. This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity. The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory. Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Data Quality and Management

MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Weight Decay · Residual Connection · Linear Warmup With Cosine Annealing · Layer Normalization