Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness
Ding Linghu, Cheng Wang, Da Fan, Wei Shi, Kaifeng Yin, Xiaoliang Xue, Fan Yang, Haiyi Ren, Cong Zhang

TL;DR
This paper introduces a cognitively layered data generation framework for fine-tuning large language models, significantly improving their performance in space situational awareness tasks by creating high-quality, domain-specific datasets.
Contribution
The paper presents BD-FDG, a novel knowledge organization and question modeling framework that enhances dataset quality for domain-specific LLM fine-tuning, demonstrated on SSA tasks.
Findings
SSA-LLM-8B outperforms baseline models with 144-176% BLEU-1 improvements.
Achieved 82.21% win rate in domain-specific arena comparisons.
Constructed a 230K sample SSA dataset for effective LLM adaptation.
Abstract
Large language models (LLMs) demonstrate exceptional performance on general-purpose tasks. however, transferring them to complex engineering domains such as space situational awareness (SSA) remains challenging owing to insufficient structural alignment with mission chains, the absence of higher-order cognitive supervision, and poor correspondence between data quality criteria and engineering specifications. The core bottleneck is the construction of high-quality supervised fine-tuning (SFT) datasets. To this end, we propose BD-FDG (Bloom's Taxonomy-based Domain-specific Fine-tuning Data Generation), a framework that addresses incomplete knowledge coverage, shallow cognitive depth, and limited quality controllability through three mechanisms: structured knowledge organization, cognitively layered question modeling, and automated quality control. The framework uses a knowledge tree to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
