pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang,, Noah D. Goodman, Christopher D. Manning, Christopher Potts

TL;DR
pyvene is an open-source Python library that enables flexible, customizable interventions on PyTorch models to facilitate research in model editing, interpretability, and robustness, supporting complex intervention schemes with an extensible framework.
Contribution
The paper introduces pyvene, a novel library that simplifies and unifies the process of performing interventions on neural network models in PyTorch, including static and trainable interventions.
Findings
Demonstrates interpretability analyses using causal abstraction and knowledge localization.
Provides a flexible framework for complex interventions on neural models.
Enables sharing and reproducibility of intervened models.
Abstract
Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce , an open-source Python library that supports customizable interventions on a range of different PyTorch modules. supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware System Performance and Reliability
MethodsLib
