ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation
Dimitris Gkoumas

TL;DR
This paper introduces ALMol, a novel contrastive preference optimization method for language-molecule translation using LLMs, achieving significant improvements with limited data and proposing a new evaluation for hallucination detection.
Contribution
The paper presents a new training approach called contrastive preference optimization for language-molecule translation that enhances performance and generalizability with limited data.
Findings
Achieves up to 32% performance improvement over baseline models.
Effective with only 10% of training data, reducing data dependency.
Proposes a new evaluation method to detect hallucinations in LLMs.
Abstract
The field of chemistry and Artificial Intelligence (AI) intersection is an area of active research that aims to accelerate scientific discovery. The integration of large language models (LLMs) with scientific modalities has shown significant promise in this endeavour. However, challenges persist in effectively addressing training efficacy and the out-of-distribution problem, particularly as existing approaches rely on larger models and datasets. In this context, we focus on machine language-molecule translation and deploy a novel training approach called contrastive preference optimisation, which avoids generating translations that are merely adequate but not perfect. To ensure generalisability and mitigate memorisation effects, we conduct experiments using only 10% of the data. Our results demonstrate that our models achieve up to a 32% improvement compared to counterpart models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Fuel Cells and Related Materials · Natural Language Processing Techniques
MethodsFocus
