SCIMAT: Science and Mathematics Dataset
Neeraj Kollepara, Snehith Kumar Chatakonda, Pawan Kumar

TL;DR
This paper introduces SCIMAT, a large open-source dataset of science and mathematics problems for pre-college and college levels, aiming to facilitate research and improve problem-solving models.
Contribution
It provides a comprehensive, well-curated dataset for science and mathematics problems, along with initial transformer-based results and challenges for future research.
Findings
Preliminary transformer results demonstrate potential.
The dataset includes challenging problems for advanced research.
Invites exploration of better architectures for problem solving.
Abstract
In this work, we announce a comprehensive well curated and opensource dataset with millions of samples for pre-college and college level problems in mathematicsand science. A preliminary set of results using transformer architecture with character to character encoding is shown. The dataset identifies some challenging problem and invites research on better architecture search
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Algorithms and Data Compression · Computational Physics and Python Applications
