KA2L: A Knowledge-Aware Active Learning Framework for LLMs
Haoxuan Yin, Bojian Liu, Chen Tang, Yangfan Wang, Lian Yan, Jingchi Jiang

TL;DR
KA2L is a novel active learning framework that improves large language models by assessing their knowledge mastery, reducing costs, and enhancing performance through targeted training on unknown knowledge points.
Contribution
This paper introduces a knowledge distribution probing technique and a hidden-state decoding method to effectively identify and target unknown knowledge in LLMs, optimizing active learning.
Findings
Reduces annotation and computation costs by 50%
Achieves better performance on multiple datasets
Provides insights into knowledge comprehension in LLMs
Abstract
Fine-tuning large language models (LLMs) with high-quality knowledge has been shown to enhance their performance effectively. However, there is a paucity of research on the depth of domain-specific knowledge comprehension by LLMs and the application of targeted active learning to improve their expertise. To address this gap, we introduce the Knowledge-Aware Active Learning (KA2L) framework. This framework assesses LLMs' mastery of specific knowledge points to aid in constructing unanswerable or unknowable questions through latent space analysis. This active learning strategy enhances training efficiency by focusing on knowledge the model has yet to master, thereby minimizing redundancy in learning already acquired information. This study innovatively employs a knowledge distribution probing technique to examine the hidden states of specific Transformer layers and identify the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The idea of using semantic entropy to guide data selection in active learning is novel. 2. The result is validated with detailed experiments on nine open-source LLMs.
1. A core claim of this paper is that “a higher SE value suggests greater semantic divergence, indicating that the model has not mastered the knowledge associated with the question” (Sec 3.2.2), but there are no experiments explicitly validating that a higher SE score is directly related to a lower performance metrics, making this claim questionable. 2. In the experiments, this paper constructs a D_combine dataset “simulating a standard, unfiltered dataset collected without an active learning s
- Thorough empirical validation—extensive ablations, layer-wise probes, traditional-AL comparisons, and robustness checks across diverse model families. - Practical impact—simple MLP probe and T5-based decoder add negligible inference cost yet yield large savings, with reproducible code provided. - Clear writing and well-motivated research questions.
- The novelty should be highlighted. Adding a small model for LLM active learning is not quite new. What are the most significant differences between the current method and the existing ones, e.g., FreeAL? The semantic entropy here is more like a prediction confidence, please refer to DeepConf (https://arxiv.org/abs/2508.15260) for the related work. - Scope is limited to factual closed-book QA; unclear how well the probe transfers to open-ended or reasoning-heavy tasks. - Generated QA pairs st
- The paper performs robust evaluation in its experiments on three datasets with nine different LLMs. - Clear Guidance through the experimental section via research questions. - The paper proposes a simple yet efficient approximation scheme for "Knowledge" of the LLM via a binary MLP classifier and proposes a method for hyperparameter selection.
- The proposed method is compared to well-known AL methods. However, the setting doesn't really involve an iterative selection scheme but a one-time selection, which does not favor any of the compared AL strategies and leads to heavy overlap in informativeness of selected samples (one-time selection of 5000 samples out of 10000 samples). As a result, the provided empirical evidence in sec. 5.3 is rather meaningless. - While the adaptation of CoreSet seems appropriate, the adaptation of BADGE see
The paper is clearly presented, and the proposed method is well-explained. Additional details in the appendix enhance the paper’s credibility. The underlying motivation is clear and well-justified.
1. The research contributions are not clearly articulated. Contributions 1 and 2 (lines 079–087) both claim novelty in the proposed framework, particularly in integrating LLM knowledge distribution probing with hallucination detection. However, Contribution 2 appears to be a subset of Contribution 1. It remains unclear which components of the framework are truly novel—for example, the use of SE, or the extraction and concatenation of the last token’s representations across layers. 2. The experim
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Topic Modeling · Text Readability and Simplification
