WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications

Xin Li; Mengbing Liu; Li Wei; Jiancheng An; M\'erouane Debbah; Chau Yuen

arXiv:2505.14354·cs.CL·May 21, 2025

WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications

Xin Li, Mengbing Liu, Li Wei, Jiancheng An, M\'erouane Debbah, Chau Yuen

PDF

Open Access 1 Datasets 1 Video

TL;DR

WirelessMathBench is a new benchmark designed to evaluate LLMs on complex mathematical modeling tasks specific to wireless communications, revealing current models' limitations in domain-specific reasoning.

Contribution

We introduce WirelessMathBench, a comprehensive benchmark with 587 questions from research papers to assess LLMs' capabilities in wireless communication mathematics.

Findings

01

LLMs perform poorly on complex equation reconstruction tasks.

02

DeepSeek-R1 achieves only 38.05% average accuracy on the benchmark.

03

Current LLMs struggle with domain-specific mathematical reasoning.

Abstract

Large Language Models (LLMs) have achieved impressive results across a broad array of tasks, yet their capacity for complex, domain-specific mathematical reasoning-particularly in wireless communications-remains underexplored. In this work, we introduce WirelessMathBench, a novel benchmark specifically designed to evaluate LLMs on mathematical modeling challenges to wireless communications engineering. Our benchmark consists of 587 meticulously curated questions sourced from 40 state-of-the-art research papers, encompassing a diverse spectrum of tasks ranging from basic multiple-choice questions to complex equation completion tasks, including both partial and full completions, all of which rigorously adhere to physical and dimensional constraints. Through extensive experimentation with leading LLMs, we observe that while many models excel in basic recall tasks, their performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

XINLI1997/WirelessMathBench
dataset· 10 dl
10 dl

Videos

WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications· underline

Taxonomy

TopicsCooperative Communication and Network Coding · IPv6, Mobility, Handover, Networks, Security · Wireless Networks and Protocols

MethodsFast Attention Via Positive Orthogonal Random Features · Performer