A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving
Yi Zhang, Guangyou Zhou, Zhiwen Xie, Jinjin Ma, Jimmy Xiangji Huang

TL;DR
This paper introduces a novel Diversity-enhanced Knowledge Distillation model for math word problem solving, improving the diversity and accuracy of generated solutions by combining adaptive distillation and a diversity prior with a variational auto-encoder.
Contribution
The paper proposes a new diversity-enhanced knowledge distillation approach with a diversity prior and adaptive transfer, advancing the diversity and accuracy in math word problem solving models.
Findings
Achieves higher answer accuracy than strong baselines.
Maintains high efficiency for practical applications.
Effectively captures solution diversity using a variational auto-encoder.
Abstract
Math Word Problem (MWP) solving is a critical task in natural language processing, has garnered significant research interest in recent years. Various recent studies heavily rely on Seq2Seq models and their extensions (e.g., Seq2Tree and Graph2Tree) to generate mathematical equations. While effective, these models struggle to generate diverse but counterpart solution equations, limiting their generalization across various math problem scenarios. In this paper, we introduce a novel Diversity-enhanced Knowledge Distillation (DivKD) model for practical MWP solving. Our approach proposes an adaptive diversity distillation method, in which a student model learns diverse equations by selectively transferring high-quality knowledge from a teacher model. Additionally, we design a diversity prior-enhanced student model to better capture the diversity distribution of equations by incorporating a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Mathematics Education and Teaching Techniques
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Knowledge Distillation · Sequence to Sequence
