We Need Knowledge Distillation for Solving Math Word Problems

Zhenquan Shen; Xinguo Yu; Xiaotian Cheng; Rao Peng; Hao Ming

arXiv:2507.02982·cs.CL·July 8, 2025

We Need Knowledge Distillation for Solving Math Word Problems

Zhenquan Shen, Xinguo Yu, Xiaotian Cheng, Rao Peng, Hao Ming

PDF

TL;DR

This paper demonstrates that knowledge distillation can effectively compress large language models for math word problem solving, maintaining high accuracy while significantly reducing computational costs, thus benefiting educational applications.

Contribution

It introduces a method to compress LLMs for MWPs via vector distillation, preserving performance and generalizability, and reveals key linguistic features influencing compressibility.

Findings

01

Student model retains ~90% of teacher performance

02

Model is task-agnostic and generalizes well across MWPs

03

Part-of-speech info is crucial for MWP compressibility

Abstract

The enhancement of mathematical capabilities in large language models (LLMs) fosters new developments in mathematics education within primary and secondary schools, particularly as they relate to intelligent tutoring systems. However, LLMs require substantial computational resources, resulting in significant costs in educational contexts. To mitigate this drawback, this paper investigates the feasibility of compressing LLMs for solving math word problems (MWPs). We compress the embedded vectors encoded by BERT and distill a considerably smaller student model. Our findings indicate that the student model can maintain nearly 90% of the performance of the teacher model while utilizing only 1/12 of its parameters. In addition to achieving high accuracy, the model exhibits strong generalizability, as the compressed vectors perform well across all tasks related to MWPs, and the distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.