Bridging the Domain Gap in Equation Distillation with Reinforcement Feedback

Wangyang Ying; Haoyue Bai; Nanxu Gong; Xinyuan Wang; Sixun Dong; Haifeng Chen; Yanjie Fu

arXiv:2505.15572·cs.LG·May 22, 2025

Bridging the Domain Gap in Equation Distillation with Reinforcement Feedback

Wangyang Ying, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Sixun Dong, Haifeng Chen, Yanjie Fu

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based finetuning method to improve foundation models for data-to-equation tasks, enabling better domain adaptation and more accurate, meaningful equation generation from complex data.

Contribution

It proposes a novel reinforcement learning framework that directly optimizes equation generation models using numerical fitness rewards, addressing domain adaptation and semantic accuracy issues.

Findings

01

Enhanced equation accuracy and robustness on complex datasets

02

Improved domain adaptability of foundation models

03

Outperforms existing methods in equation generation tasks

Abstract

The data-to-equation (Data2Eqn) task aims to discover interpretable mathematical equations that map observed values to labels, offering physical insights and broad applicability across academic and industrial domains. Genetic programming and traditional deep learning-based approaches suffer from search inefficiency and poor generalization on small task-specific datasets. Foundation models showed promise in this area, but existing approaches suffer from: 1) They are pretrained on general-purpose data distributions, making them less effective for domain-specific tasks; and 2) their training objectives focus on token-level alignment, overlooking mathematical semantics, which can lead to inaccurate equations. To address these issues, we aim to enhance the domain adaptability of foundation models for Data2Eqn tasks. In this work, we propose a reinforcement learning-based finetuning framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration

MethodsFocus