Jet Expansions of Residual Computation
Yihong Chen, Xiangxiang Xu, Yao Lu, Pontus Stenetorp, Luca Franceschi

TL;DR
This paper introduces a jet-based framework for expanding residual computational graphs, enabling data-free interpretability and analysis of models' internal computations and knowledge structures.
Contribution
It presents a novel, data-free method for analyzing residual computations using jets, unifying and extending existing techniques like the logit lens.
Findings
Reveals a super-exponential path structure in residual computations.
Enables sketching language models with n-gram statistics.
Allows indexing models' toxicity knowledge levels.
Abstract
We introduce a framework for expanding residual computational graphs using jets, operators that generalize truncated Taylor series. Our method provides a systematic approach to disentangle contributions of different computational paths to model predictions. In contrast to existing techniques such as distillation, probing, or early decoding, our expansions rely solely on the model itself and requires no data, training, or sampling from the model. We demonstrate how our framework grounds and subsumes logit lens, reveals a (super-)exponential path structure in the recursive residual depth and opens up several applications. These include sketching a transformer large language model with -gram statistics extracted from its computations, and indexing the models' levels of toxicity knowledge. Our approach enables data-free analysis of residual computation for model interpretability,…
Peer Reviews
Decision·Submitted to ICLR 2025
Well-written and theoretically well-developed, with content that is thorough yet not overwhelming.
I find the theoretical foundation to be solid, but my main concerns lie with the experimental approach. The experiments may be overly empirical and lack statistical rigor. For instance, in Section 5.1, only a handful of jet paths corresponding to specific linguistic functions are selected to demonstrate intervention effects. A more systematic approach is needed to demonstrate that the jet lens is more effective than the logit lens. While the paper primarily offers an analytical framework, it l
The method itself is very interesting; thinking of a network as a sum-of-paths is very natural and the jet formalism seems to capture it in a nice way. The authors show that this generalizes prior work such as the logit lens. Since the top network architectures are residual, this is applicable to the most common model types. Interpereting parts of the network is an important and relevant topic.
- The primary issue is confusing exposition and incomplete details. These make the contributions difficult to asses. I enumerated specifics in the "questions" section. - The exponential expansion factor means that to analyze a model like LLaMa 405b, one would have 2^118 terms which seems a bit unwieldy - Presumably, you need ways of computing k-th order jets for network components (the authors don't seem to discuss this), which makes implementation difficult.
- The mathematical exposition is clear. - The authors acknowledge the limitation of their method in capturing the nonlinear model exactly. - The applicability of the proposed method for evaluating models globally is interesting and promising. In particular, the model diffing experiments provide the potential of useful metrics for assessing the effectiveness of a specific fine-tuning method, the rate of model improvement and saturation, and the potential for certain emergent properties from n-gra
- The first four sections contain clear mathematical expressions. The remaining sections do not use any of these notations which makes it very hard to digest what the figures are measuring in the context of the proposed method. I’d encourage the authors to improve the clarity of the figures and the captions. At the moment, they are very unclear. - One way to address the first weakness would be to add a new section, between the theoretical section and the empirical section, which explains in the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
