Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo
Nakul Rampal, Kaiyu Wang, Matthew Burigana, Lingxiang Hou, Juri, Al-Johani, Anna Sackmann, Hanan S. Murayshid, Walaa Abdullah Al-Sumari, Arwa, M. Al-Abdulkarim, Nahla Eid Al-Hazmi, Majed O. Al-Awad, Christian Borgs,, Jennifer T. Chayes, Omar M. Yaghi

TL;DR
This paper introduces RetChemQA, a large-scale benchmark dataset with single-hop and multi-hop questions for reticular chemistry, generated using GPT-4 Turbo, to evaluate machine learning models' understanding of complex scientific literature.
Contribution
The paper presents a novel, extensive dataset specifically designed for reticular chemistry, created with GPT-4 Turbo, including questions and synthesis conditions for advancing AI research in this domain.
Findings
Dataset contains approximately 45,000 Q&As for each question type.
Includes synthesis conditions extracted from over 2,500 research papers.
Provides a platform for evaluating AI models on complex scientific tasks.
Abstract
The rapid advancement in artificial intelligence and natural language processing has led to the development of large-scale datasets aimed at benchmarking the performance of machine learning models. Herein, we introduce 'RetChemQA,' a comprehensive benchmark dataset designed to evaluate the capabilities of such models in the domain of reticular chemistry. This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type. The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group, among others. The dataset has been generated using OpenAI's GPT-4 Turbo, a cutting-edge model known for its exceptional language understanding and generation capabilities. In addition to the Q&A dataset, we also release a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsAttention Is All You Need · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Adam
