Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives

Zhihu Wang; Shiwan Zhao; Yu Wang; Heyuan Huang; Sitao Xie; Yubo Zhang; Jiaxin Shi; Zhixing Wang; Hongyan Li; Junchi Yan

arXiv:2408.06904·cs.CL·June 23, 2025·2 cites

Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives

Zhihu Wang, Shiwan Zhao, Yu Wang, Heyuan Huang, Sitao Xie, Yubo Zhang, Jiaxin Shi, Zhixing Wang, Hongyan Li, Junchi Yan

PDF

Open Access 3 Reviews

TL;DR

Re-TASK introduces a theoretical framework for improving LLM task performance by analyzing and enhancing capabilities, skills, and knowledge, leading to significant gains in domain-specific tasks through targeted prompting strategies.

Contribution

The paper presents Re-TASK, a novel theoretical model and prompting strategy that revisits LLM tasks from capability, skill, and knowledge perspectives, addressing CoT limitations.

Findings

01

45% performance improvement on Yi-1.5-9B for legal tasks

02

24.5% performance improvement on Llama3-Chinese-8B

03

Effective enhancement of LLMs through targeted knowledge and skill injection

Abstract

The Chain-of-Thought (CoT) paradigm has become a pivotal method for solving complex problems with large language models (LLMs). However, its application to domain-specific tasks remains challenging, as LLMs often fail to decompose tasks accurately or execute subtasks effectively. This paper introduces the Re-TASK framework, a novel theoretical model that revisits LLM tasks from capability, skill, and knowledge perspectives, drawing on the principles of Bloom's Taxonomy and Knowledge Space Theory. While CoT provides a workflow-centric perspective on tasks, Re-TASK introduces a Chain-of-Learning (CoL) paradigm that highlights task dependencies on specific capability items, further broken down into their constituent knowledge and skill components. To address CoT failures, we propose a Re-TASK prompting strategy, which strengthens task-relevant capabilities through targeted knowledge…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

- The framework’s design is motivated by a learning theory, which adds a theoretical foundation to the methodology. - The distinction between knowledge and skill acquisition is an interesting concept, reflected in the architecture and prompting strategy. - The results are promising for domain-specific applications

Weaknesses

- A primary limitation is the choice to limit experiments to open-source models. It’s not clear why the authors couldn’t include a proprietary model like GPT-4, as this would help validate how the framework performs across a wider range of LLMs. The process of constructing capabilities seems relatively manual, which may restrict the framework's scalability to broader, less-defined tasks outside highly specialized domains. - Constructing capabilities still seem fairly manual at this point, which

Reviewer 02Rating 3Confidence 3

Strengths

1. Integrating ideas from education and cognitive science into improving LLM prompting is quite novel.

Weaknesses

Main conerns: 1. The prompting framework requires significant manual effort and domain expertise---it can only be done by experts who can successfully identify the capacity items, decompose the task, and apply all structured prompting techniques. The bar for using it seems too high. 2. Most evaluation uses custom datasets, which makes it hard to compare the proposed framework on more general tasks with a wider range of related methods. Also, the fact that two of the three evaluation domains are

Reviewer 03Rating 3Confidence 3

Strengths

[Originality] It's a novel idea to apply Bloom’s Taxonomy and Knowledge Space Theory to LLM prompting. [Significance] LM prompting strategies have large impacts on the performance and are an important topic to study.

Weaknesses

[Clarity] The connections to the educational theories seem strenuous. The term "capability item" also seems unnecessarily abstract. A clearer message would be "hints generated by a large LM improve problem solving of smaller LMs". The paper can also benefit from providing a detailed description about how high quality hints ("capability items") are generated with large LMs, e.g., what's the prompting strategy used? [Significance] It is not surprising that hints from high quality LLMs of ~70B par

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law