M100: An Orchestrated Dataflow Architecture Powering General AI Computing

Yan Xie; Changkui Mao; Changsong Wu; Chao Lu; Chao Suo; Cheng Qian; Chun Yang; Danyang Zhu; Hengchang Xiong; Hongzhan Lu; Hongzhen Liu; Jiafu Liu; Jie Chen; Jie Dai; Junfeng Tang; Kai Liu; Kun Li; Lipeng Ge; Meng Sun; Min Luo; Peng Chen; Peng Wang; Shaodong Yang; Shibin Tang; Shibo Chen; Weikang Zhang; Xiao Ling; Xiaobo Du; Xin Wu; Yang Liu; Yi Jiang; Yihua Jin; Yin Huang; Yuli Zhang; Zhen Yuan; Zhiyuan Man; Zhongxiao Yao

arXiv:2604.17862·cs.LG·April 21, 2026

M100: An Orchestrated Dataflow Architecture Powering General AI Computing

Yan Xie, Changkui Mao, Changsong Wu, Chao Lu, Chao Suo, Cheng Qian, Chun Yang, Danyang Zhu, Hengchang Xiong, Hongzhan Lu, Hongzhen Liu, Jiafu Liu, Jie Chen, Jie Dai, Junfeng Tang, Kai Liu, Kun Li, Lipeng Ge, Meng Sun, Min Luo, Peng Chen, Peng Wang, Shaodong Yang, Shibin Tang

PDF

TL;DR

M100 is a novel dataflow architecture designed for efficient, cost-effective general AI inference across diverse applications like autonomous driving and large language models, outperforming traditional GPGPU systems.

Contribution

The paper introduces M100, a dataflow parallel architecture with compiler-architecture co-design that enhances AI inference efficiency and scalability without relying on caching.

Findings

01

M100 outperforms GPGPU architectures in autonomous driving benchmarks.

02

Eliminates caching by using data streams for tensor computations.

03

Achieves higher utilization and efficiency across diverse AI workloads.

Abstract

As deep learning-based AI technologies gain momentum, the demand for general-purpose AI computing architectures continues to grow. While GPGPU-based architectures offer versatility for diverse AI workloads, they often fall short in efficiency and cost-effectiveness. Various Domain-Specific Architectures (DSAs) excel at particular AI tasks but struggle to extend across broader applications or adapt to the rapidly evolving AI landscape. M100 is Li Auto's response: a performant, cost-effective architecture for AI inference in Autonomous Driving (AD), Large Language Models (LLMs), and intelligent human interactions, domains crucial to today's most competitive automobile platforms. M100 employs a dataflow parallel architecture, where compiler-architecture co-design orchestrates not only computation but, more critically, data movement across time and space. Leveraging dataflow computing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.