Feature Integration Spaces: Joint Training Reveals Dual Encoding in Neural Network Representations

Omar Claflin

arXiv:2507.00269·q-bio.NC·December 10, 2025

Feature Integration Spaces: Joint Training Reveals Dual Encoding in Neural Network Representations

Omar Claflin

PDF

Open Access 1 Video

TL;DR

This paper introduces a dual encoding framework in neural networks, revealing separate feature identity and integration spaces, and demonstrates improved interpretability and behavior modeling through joint training architectures.

Contribution

It proposes a novel dual encoding hypothesis and develops joint-training architectures that capture feature identity and integration simultaneously, advancing interpretability of neural representations.

Findings

01

Joint training improves reconstruction by 41.3%

02

Integration features show sensitivity to experimental manipulations

03

Nonlinear components achieve 16.5% standalone improvements

Abstract

Current sparse autoencoder (SAE) approaches to neural network interpretability assume that activations can be decomposed through linear superposition into sparse, interpretable features. Despite high reconstruction fidelity, SAEs consistently fail to eliminate polysemanticity and exhibit pathological behavioral errors. We propose that neural networks encode information in two complementary spaces compressed into the same substrate: feature identity and feature integration. To test this dual encoding hypothesis, we develop sequential and joint-training architectures to capture identity and integration patterns simultaneously. Joint training achieves 41.3% reconstruction improvement and 51.6% reduction in KL divergence errors. This architecture spontaneously develops bimodal feature organization: low squared norm features contributing to integration pathways and the rest contributing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Feature Integration Spaces: Joint Training Reveals Dual Encoding in Neural Network Representations· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning