AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models
Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit, Dhurandhar

TL;DR
AGGA is a comprehensive dataset of 80 academic guidelines for Generative AI and LLMs, collected from diverse global universities, supporting NLP tasks and benchmarking in requirements engineering.
Contribution
This paper presents AGGA, a novel, large-scale dataset of academic guidelines for GAIs and LLMs, enabling new research and evaluation in requirements engineering and related NLP tasks.
Findings
Dataset includes 188,674 words from diverse institutions
Supports tasks like ambiguity detection and requirements categorization
Enables benchmarking for academic guidelines in AI use
Abstract
This study introduces AGGA, a dataset comprising 80 academic guidelines for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in academic settings, meticulously collected from official university websites. The dataset contains 188,674 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering, such as model synthesis, abstraction identification, and document structure assessment. Additionally, AGGA can be further annotated to function as a benchmark for various tasks, including ambiguity detection, requirements categorization, and the identification of equivalent requirements. Our methodologically rigorous approach ensured a thorough examination, with a selection of universities that represent a diverse range of global institutions, including top-ranked universities across six continents. The dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
