Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
Jorge Garc\'ia-Carrasco, Alejandro Mat\'e, Juan Trujillo

TL;DR
This paper introduces a method to extract minimal, task-specific circuits from large language models, enabling faster inference and improved interpretability without additional training.
Contribution
It presents a novel automatic extraction technique for task-specific circuits from LLMs, reducing model size and enhancing interpretability without extra training.
Findings
Model size reduced by up to 82.77%
Extracted circuits are more interpretable
Approach works across different tasks
Abstract
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks. However, the size of LLMs is steadily increasing, hindering their application on computationally constrained environments. On the other hand, despite their general capabilities, there are many situations where only one specific task is performed, rendering all other capabilities unnecessary and wasteful. This leads us to the following question: Is it possible to extract the minimal subset from an LLM that is able to perform a specific task in a faster, standalone manner? Recent works on Mechanistic Interpretability (MI) have shown that specific tasks are performed by a localized subset of components, or circuit. However, current techniques used to identify the circuit cannot be used to extract it for its standalone usage. In this work, we propose a novel approach to automatically extract the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsFocus
