Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning
Yuxuan Ren, Dihan Zheng, Chang Liu, Peiran Jin, Yu Shi, Lin Huang,, Jiyan He, Shengjie Luo, Tao Qin, Tie-Yan Liu

TL;DR
This paper introduces a physics-informed consistency training method that enables multi-task learning models to effectively integrate heterogeneous molecular data, improving predictions by leveraging physical laws and cross-task information exchange.
Contribution
It proposes a novel consistency training approach that exploits physical laws to connect different molecular tasks, enhancing multi-task learning with heterogeneous data.
Findings
Energy data improves structure prediction accuracy.
Force and off-equilibrium data enhance structure prediction.
Physical consistency enables better data integration.
Abstract
In recent years, machine learning has demonstrated impressive capability in handling molecular science tasks. To support various molecular properties at scale, machine learning models are trained in the multi-task learning paradigm. Nevertheless, data of different molecular properties are often not aligned: some quantities, e.g. equilibrium structure, demand more cost to compute than others, e.g. energy, so their data are often generated by cheaper computational methods at the cost of lower accuracy, which cannot be directly overcome through multi-task learning. Moreover, it is not straightforward to leverage abundant data of other tasks to benefit a particular task. To handle such data heterogeneity challenges, we exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches that allow different tasks to exchange…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods
