FlexiDataGen: An Adaptive LLM Framework for Dynamic Semantic Dataset Generation in Sensitive Domains
Hamed Jelodar, Samita Bai, Roozbeh Razavi-Far, Ali A. Ghorbani

TL;DR
FlexiDataGen is an adaptive LLM framework that dynamically generates high-quality, domain-specific datasets to address data scarcity and privacy constraints in sensitive fields like healthcare and cybersecurity.
Contribution
It introduces a novel framework combining multiple components to autonomously synthesize semantically coherent datasets tailored for sensitive domains.
Findings
Effectively alleviates data shortages in high-stakes domains.
Enables scalable and accurate model development with synthetic data.
Demonstrates high-quality, domain-relevant data generation.
Abstract
Dataset availability and quality remain critical challenges in machine learning, especially in domains where data are scarce, expensive to acquire, or constrained by privacy regulations. Fields such as healthcare, biomedical research, and cybersecurity frequently encounter high data acquisition costs, limited access to annotated data, and the rarity or sensitivity of key events. These issues-collectively referred to as the dataset challenge-hinder the development of accurate and generalizable machine learning models in such high-stakes domains. To address this, we introduce FlexiDataGen, an adaptive large language model (LLM) framework designed for dynamic semantic dataset generation in sensitive domains. FlexiDataGen autonomously synthesizes rich, semantically coherent, and linguistically diverse datasets tailored to specialized fields. The framework integrates four core components:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
