PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

Panagiotis Koromilas; Andreas D. Demou; James Oldfield; Yannis Panagakis; Mihalis Nicolaou

arXiv:2602.01322·cs.LG·February 3, 2026

PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding

Panagiotis Koromilas, Andreas D. Demou, James Oldfield, Yannis Panagakis, Mihalis Nicolaou

PDF

Open Access

TL;DR

PolySAE extends sparse autoencoders with polynomial decoding to model feature interactions, improving interpretability and capturing compositional structure in neural representations without relying on feature co-occurrence.

Contribution

Introduces PolySAE, a novel autoencoder variant with polynomial decoding that models feature interactions efficiently while maintaining interpretability.

Findings

01

Achieves 8% average improvement in probing F1 across models.

02

Captures feature interactions with minimal parameter overhead.

03

Learns interaction weights independent of feature co-occurrence.

Abstract

Sparse autoencoders (SAEs) have emerged as a promising method for interpreting neural network representations by decomposing activations into sparse combinations of dictionary atoms. However, SAEs assume that features combine additively through linear reconstruction, an assumption that cannot capture compositional structure: linear models cannot distinguish whether "Starbucks" arises from the composition of "star" and "coffee" features or merely their co-occurrence. This forces SAEs to allocate monolithic features for compound concepts rather than decomposing them into interpretable constituents. We introduce PolySAE, which extends the SAE decoder with higher-order terms to model feature interactions while preserving the linear encoder essential for interpretability. Through low-rank tensor factorization on a shared projection subspace, PolySAE captures pairwise and triple feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare