Jet Expansions of Residual Computation

Yihong Chen; Xiangxiang Xu; Yao Lu; Pontus Stenetorp; Luca Franceschi

arXiv:2410.06024·cs.LG·October 10, 2024

Jet Expansions of Residual Computation

Yihong Chen, Xiangxiang Xu, Yao Lu, Pontus Stenetorp, Luca Franceschi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a jet-based framework for expanding residual computational graphs, enabling data-free interpretability and analysis of models' internal computations and knowledge structures.

Contribution

It presents a novel, data-free method for analyzing residual computations using jets, unifying and extending existing techniques like the logit lens.

Findings

01

Reveals a super-exponential path structure in residual computations.

02

Enables sketching language models with n-gram statistics.

03

Allows indexing models' toxicity knowledge levels.

Abstract

We introduce a framework for expanding residual computational graphs using jets, operators that generalize truncated Taylor series. Our method provides a systematic approach to disentangle contributions of different computational paths to model predictions. In contrast to existing techniques such as distillation, probing, or early decoding, our expansions rely solely on the model itself and requires no data, training, or sampling from the model. We demonstrate how our framework grounds and subsumes logit lens, reveals a (super-)exponential path structure in the recursive residual depth and opens up several applications. These include sketching a transformer large language model with $n$ -gram statistics extracted from its computations, and indexing the models' levels of toxicity knowledge. Our approach enables data-free analysis of residual computation for model interpretability,…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

Well-written and theoretically well-developed, with content that is thorough yet not overwhelming.

Weaknesses

I find the theoretical foundation to be solid, but my main concerns lie with the experimental approach. The experiments may be overly empirical and lack statistical rigor. For instance, in Section 5.1, only a handful of jet paths corresponding to specific linguistic functions are selected to demonstrate intervention effects. A more systematic approach is needed to demonstrate that the jet lens is more effective than the logit lens. While the paper primarily offers an analytical framework, it l

Reviewer 02Rating 6Confidence 3

Strengths

The method itself is very interesting; thinking of a network as a sum-of-paths is very natural and the jet formalism seems to capture it in a nice way. The authors show that this generalizes prior work such as the logit lens. Since the top network architectures are residual, this is applicable to the most common model types. Interpereting parts of the network is an important and relevant topic.

Weaknesses

- The primary issue is confusing exposition and incomplete details. These make the contributions difficult to asses. I enumerated specifics in the "questions" section. - The exponential expansion factor means that to analyze a model like LLaMa 405b, one would have 2^118 terms which seems a bit unwieldy - Presumably, you need ways of computing k-th order jets for network components (the authors don't seem to discuss this), which makes implementation difficult.

Reviewer 03Rating 5Confidence 4

Strengths

- The mathematical exposition is clear. - The authors acknowledge the limitation of their method in capturing the nonlinear model exactly. - The applicability of the proposed method for evaluating models globally is interesting and promising. In particular, the model diffing experiments provide the potential of useful metrics for assessing the effectiveness of a specific fine-tuning method, the rate of model improvement and saturation, and the potential for certain emergent properties from n-gra

Weaknesses

- The first four sections contain clear mathematical expressions. The remaining sections do not use any of these notations which makes it very hard to digest what the figures are measuring in the context of the proposed method. I’d encourage the authors to improve the clarity of the figures and the captions. At the moment, they are very unclear. - One way to address the first weakness would be to add a new section, between the theoretical section and the empirical section, which explains in the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning