Bench360: Benchmarking Local LLM Inference from 360 Degrees

Linus Stuhlmann; Mauricio Fadel Argerich; Jonathan F\"urst

arXiv:2511.16682·cs.CL·January 15, 2026

Bench360: Benchmarking Local LLM Inference from 360 Degrees

Linus Stuhlmann, Mauricio Fadel Argerich, Jonathan F\"urst

PDF

Open Access

TL;DR

Bench360 is a comprehensive benchmarking framework for evaluating local large language model inference across diverse tasks, system metrics, and configurations, aiding deployment decisions.

Contribution

It introduces a unified platform supporting multiple inference engines, quantization formats, and custom tasks, filling gaps left by fragmented existing benchmarks.

Findings

01

Tradeoffs between efficiency and quality are significant.

02

Configuration choices depend on specific workloads and constraints.

03

No universal best configuration exists for local LLM inference.

Abstract

Running LLMs locally has become increasingly common, but users face a complex design space across models, quantization levels, inference engines, and serving scenarios. Existing inference benchmarks are fragmented and focus on isolated goals, offering little guidance for practical deployments. We present Bench360, a framework for evaluating local LLM inference across tasks, usage patterns, and system metrics in one place. Bench360 supports custom tasks, integrates multiple inference engines and quantization formats, and reports both task quality and system behavior (latency, throughput, energy, startup time). We demonstrate it on four NLP tasks across three GPUs and four engines, showing how design choices shape efficiency and output quality. Results confirm that tradeoffs are substantial and configuration choices depend on specific workloads and constraints. There is no universal best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Software Engineering Research · Scientific Computing and Data Management