Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking

Tuowei Wang; Ruwen Fan; Minxing Huang; Zixu Hao; Kun Li; Ting Cao; Youyou Lu; Yaoxue Zhang; Ju Ren

arXiv:2410.19274·cs.LG·October 14, 2025

Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking

Tuowei Wang, Ruwen Fan, Minxing Huang, Zixu Hao, Kun Li, Ting Cao, Youyou Lu, Yaoxue Zhang, Ju Ren

PDF

Open Access

TL;DR

Neuralink introduces a novel neuron placement strategy leveraging co-activation patterns to significantly accelerate large language model inference on smartphones by optimizing storage access and I/O efficiency.

Contribution

It presents a new neuron placement approach that combines sparsity and storage-level system design to improve LLM inference speed on mobile devices.

Findings

01

Achieves 1.49x reduction in inference latency on smartphones.

02

First to optimize storage placement for sparse LLMs.

03

Demonstrates effectiveness across various smartphones and models.

Abstract

Large Language Models (LLMs) have achieved remarkable success across various domains, yet deploying them on mobile devices remains an arduous challenge due to their extensive computational and memory demands. While lightweight LLMs have been developed to fit mobile environments, they suffer from degraded model accuracy. In contrast, sparsity-based techniques minimize DRAM usage by selectively transferring only relevant neurons to DRAM while retaining the full model in external storage, such as flash. However, such approaches are critically limited by numerous I/O operations, particularly on smartphones with severe IOPS constraints. In this paper, we propose Neuralink, a novel approach that accelerates LLM inference on smartphones by optimizing neuron placement in flash memory. Neuralink leverages the concept of Neuron Co-Activation, where neurons frequently activated together are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsALIGN