Unlocking Model Insights: A Dataset for Automated Model Card Generation
Shruti Singh, Hitesh Lodwal, Husain Malwat, Rakesh Thakur, Mayank, Singh

TL;DR
This paper introduces a dataset of question-answer pairs about ML models to automate model card generation, aiming to improve transparency and reduce manual effort in documenting model details.
Contribution
It provides a new dataset for training models to automatically generate comprehensive model cards from research papers.
Findings
Current LMs show limited understanding of research papers.
Automated model card generation can be improved with specialized datasets.
The dataset enables training models to better extract model details from papers.
Abstract
Language models (LMs) are no longer restricted to ML community, and instruction-tuned LMs have led to a rise in autonomous AI agents. As the accessibility of LMs grows, it is imperative that an understanding of their capabilities, intended usage, and development cycle also improves. Model cards are a popular practice for documenting detailed information about an ML model. To automate model card generation, we introduce a dataset of 500 question-answer pairs for 25 ML models that cover crucial aspects of the model, such as its training configurations, datasets, biases, architecture details, and training resources. We employ annotators to extract the answers from the original paper. Further, we explore the capabilities of LMs in generating model cards by answering questions. Our initial experiments with ChatGPT-3.5, LLaMa, and Galactica showcase a significant gap in the understanding of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsGalactica
