A fully synthetic textual dataset of student learning habits and preferences generated using a large language model
Mehedi Hasan

TL;DR
This paper introduces a synthetic dataset of student learning habits and preferences generated using a large language model to support educational research without privacy concerns.
Contribution
The novelty lies in creating a fully synthetic, privacy-preserving dataset of student learning data using a large language model.
Findings
The dataset contains 10,000 records with attributes like education level, study habits, and learning preferences.
It is designed for benchmarking educational NLP pipelines and evaluating synthetic data generation techniques.
The dataset uses controlled distributions and avoids real personal information to ensure privacy.
Abstract
Educational data mining and learning analytics have become important research areas for supporting pedagogical analysis, algorithm development, and privacy-preserving educational research. The advancement of natural language processing (NLP) methods in educational contexts depends on the availability of structured and well-documented textual datasets; however, access to real student data is often restricted due to ethical, legal, and privacy concerns. This article presents a fully synthetic textual dataset of student learning habits and preferences generated using a large language model (LLM). The dataset contains 10,000 CSV-formatted records representing fictional students and includes attributes such as education level, study hours, preferred learning methods, learning challenges, motivation levels, opinions on online learning, and primary devices used for study. Data generation was…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Intelligent Tutoring Systems and Adaptive Learning · Mental Health via Writing
