A Diversity-Enhanced Knowledge Distillation Model for Practical Math   Word Problem Solving

Yi Zhang; Guangyou Zhou; Zhiwen Xie; Jinjin Ma; Jimmy Xiangji Huang

arXiv:2501.03670·cs.CL·January 8, 2025

A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving

Yi Zhang, Guangyou Zhou, Zhiwen Xie, Jinjin Ma, Jimmy Xiangji Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel Diversity-enhanced Knowledge Distillation model for math word problem solving, improving the diversity and accuracy of generated solutions by combining adaptive distillation and a diversity prior with a variational auto-encoder.

Contribution

The paper proposes a new diversity-enhanced knowledge distillation approach with a diversity prior and adaptive transfer, advancing the diversity and accuracy in math word problem solving models.

Findings

01

Achieves higher answer accuracy than strong baselines.

02

Maintains high efficiency for practical applications.

03

Effectively captures solution diversity using a variational auto-encoder.

Abstract

Math Word Problem (MWP) solving is a critical task in natural language processing, has garnered significant research interest in recent years. Various recent studies heavily rely on Seq2Seq models and their extensions (e.g., Seq2Tree and Graph2Tree) to generate mathematical equations. While effective, these models struggle to generate diverse but counterpart solution equations, limiting their generalization across various math problem scenarios. In this paper, we introduce a novel Diversity-enhanced Knowledge Distillation (DivKD) model for practical MWP solving. Our approach proposes an adaptive diversity distillation method, in which a student model learns diverse equations by selectively transferring high-quality knowledge from a teacher model. Additionally, we design a diversity prior-enhanced student model to better capture the diversity distribution of equations by incorporating a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

a773938364/divkd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Mathematics Education and Teaching Techniques

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Knowledge Distillation · Sequence to Sequence