Extracting Interpretable Task-Specific Circuits from Large Language   Models for Faster Inference

Jorge Garc\'ia-Carrasco; Alejandro Mat\'e; Juan Trujillo

arXiv:2412.15750·cs.LG·December 23, 2024

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference

Jorge Garc\'ia-Carrasco, Alejandro Mat\'e, Juan Trujillo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to extract minimal, task-specific circuits from large language models, enabling faster inference and improved interpretability without additional training.

Contribution

It presents a novel automatic extraction technique for task-specific circuits from LLMs, reducing model size and enhancing interpretability without extra training.

Findings

01

Model size reduced by up to 82.77%

02

Extracted circuits are more interpretable

03

Approach works across different tasks

Abstract

Large Language Models (LLMs) have shown impressive performance across a wide range of tasks. However, the size of LLMs is steadily increasing, hindering their application on computationally constrained environments. On the other hand, despite their general capabilities, there are many situations where only one specific task is performed, rendering all other capabilities unnecessary and wasteful. This leads us to the following question: Is it possible to extract the minimal subset from an LLM that is able to perform a specific task in a faster, standalone manner? Recent works on Mechanistic Interpretability (MI) have shown that specific tasks are performed by a localized subset of components, or circuit. However, current techniques used to identify the circuit cannot be used to extract it for its standalone usage. In this work, we propose a novel approach to automatically extract the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jgcarrasco/circuit-extraction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsFocus