LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural   and Mathematical Languages

Xuhan Huang; Qingning Shen; Yan Hu; Anningzhe Gao; Benyou Wang

arXiv:2405.13144·cs.AI·February 18, 2025·1 cites

LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages

Xuhan Huang, Qingning Shen, Yan Hu, Anningzhe Gao, Benyou Wang

PDF

Open Access 1 Repo 1 Models 2 Datasets

TL;DR

This paper introduces Mamo, a benchmark for evaluating LLMs' ability to construct mathematical models, revealing current limitations and performance differences across model sizes and types in complex mathematical reasoning tasks.

Contribution

The paper presents a novel process-oriented evaluation framework and a comprehensive benchmark, Mamo, for assessing LLMs' mathematical modeling capabilities.

Findings

01

Larger models perform better on complex tasks

02

Open-source models are competitive on simpler problems

03

All models struggle with advanced mathematical modeling

Abstract

Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks, yet their proficiency in mathematical reasoning remains a key challenge. Addressing the gap between natural and mathematical language requires advanced reasoning capabilities, approaching those of Artificial General Intelligence (AGI). However, the evaluation remains challenging, as perfectly representing reality is inherently elusive, and traditional methods like manual or direct comparison of mathematical statements (Ramamonjison et al., 2023) are insufficient for assessing true modeling ability. We propose a process-oriented framework to evaluate LLMs' ability to construct mathematical models, using solvers to compare outputs with ground truth. Introducing Mamo, a benchmark with 1,209 questions covering ordinary differential equations, linear programming, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

freedomintelligence/mamo
noneOfficial

Models

🤗
ant-opt/LLMOPT-Qwen2.5-14B
model· 115 dl· ♡ 9
115 dl♡ 9

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Multimodal Machine Learning Applications