Weight-based Decomposition: A Case for Bilinear MLPs

Michael T. Pearce; Thomas Dooms; Alice Rigg

arXiv:2406.03947·cs.LG·June 10, 2024

Weight-based Decomposition: A Case for Bilinear MLPs

Michael T. Pearce, Thomas Dooms, Alice Rigg

PDF

Open Access 1 Repo

TL;DR

This paper introduces a tensor decomposition method for bilinear layers in neural networks, enhancing interpretability and enabling finetuning of bilinear variants in language models.

Contribution

It presents a novel tensor decomposition approach for bilinear layers, improving interpretability and demonstrating successful finetuning in language models.

Findings

01

Decomposition reveals interpretable eigenvectors in bilinear layers

02

Bilinear layers perform comparably to traditional GLUs in experiments

03

Language models can be effectively finetuned into bilinear variants

Abstract

Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. An attractive quality of bilinear layers is that they can be fully expressed in terms of a third-order tensor and linear operations. Leveraging this, we develop a method to decompose the bilinear tensor into a set of sparsely interacting eigenvectors that show promising interpretability properties in preliminary experiments for shallow image classifiers (MNIST) and small language models (Tiny Stories). Since the decomposition is fully equivalent to the model's original computations, bilinear layers may be an interpretability-friendly architecture that helps connect features to the model weights. Application of our method may not be limited to pretrained bilinear models since we find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tdooms/bilinear-decomposition
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChromatin Remodeling and Cancer · Algorithms and Data Compression · Genomics and Chromatin Dynamics

MethodsSparse Evolutionary Training