Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives
Zhihu Wang, Shiwan Zhao, Yu Wang, Heyuan Huang, Sitao Xie, Yubo Zhang, Jiaxin Shi, Zhixing Wang, Hongyan Li, Junchi Yan

TL;DR
Re-TASK introduces a theoretical framework for improving LLM task performance by analyzing and enhancing capabilities, skills, and knowledge, leading to significant gains in domain-specific tasks through targeted prompting strategies.
Contribution
The paper presents Re-TASK, a novel theoretical model and prompting strategy that revisits LLM tasks from capability, skill, and knowledge perspectives, addressing CoT limitations.
Findings
45% performance improvement on Yi-1.5-9B for legal tasks
24.5% performance improvement on Llama3-Chinese-8B
Effective enhancement of LLMs through targeted knowledge and skill injection
Abstract
The Chain-of-Thought (CoT) paradigm has become a pivotal method for solving complex problems with large language models (LLMs). However, its application to domain-specific tasks remains challenging, as LLMs often fail to decompose tasks accurately or execute subtasks effectively. This paper introduces the Re-TASK framework, a novel theoretical model that revisits LLM tasks from capability, skill, and knowledge perspectives, drawing on the principles of Bloom's Taxonomy and Knowledge Space Theory. While CoT provides a workflow-centric perspective on tasks, Re-TASK introduces a Chain-of-Learning (CoL) paradigm that highlights task dependencies on specific capability items, further broken down into their constituent knowledge and skill components. To address CoT failures, we propose a Re-TASK prompting strategy, which strengthens task-relevant capabilities through targeted knowledge…
Peer Reviews
Decision·Submitted to ICLR 2025
- The framework’s design is motivated by a learning theory, which adds a theoretical foundation to the methodology. - The distinction between knowledge and skill acquisition is an interesting concept, reflected in the architecture and prompting strategy. - The results are promising for domain-specific applications
- A primary limitation is the choice to limit experiments to open-source models. It’s not clear why the authors couldn’t include a proprietary model like GPT-4, as this would help validate how the framework performs across a wider range of LLMs. The process of constructing capabilities seems relatively manual, which may restrict the framework's scalability to broader, less-defined tasks outside highly specialized domains. - Constructing capabilities still seem fairly manual at this point, which
1. Integrating ideas from education and cognitive science into improving LLM prompting is quite novel.
Main conerns: 1. The prompting framework requires significant manual effort and domain expertise---it can only be done by experts who can successfully identify the capacity items, decompose the task, and apply all structured prompting techniques. The bar for using it seems too high. 2. Most evaluation uses custom datasets, which makes it hard to compare the proposed framework on more general tasks with a wider range of related methods. Also, the fact that two of the three evaluation domains are
[Originality] It's a novel idea to apply Bloom’s Taxonomy and Knowledge Space Theory to LLM prompting. [Significance] LM prompting strategies have large impacts on the performance and are an important topic to study.
[Clarity] The connections to the educational theories seem strenuous. The term "capability item" also seems unnecessarily abstract. A clearer message would be "hints generated by a large LM improve problem solving of smaller LMs". The paper can also benefit from providing a detailed description about how high quality hints ("capability items") are generated with large LMs, e.g., what's the prompting strategy used? [Significance] It is not surprising that hints from high quality LLMs of ~70B par
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
