Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
Marcus Armstrong, Navid Ayoobi, Arjun Mukherjee

TL;DR
This paper introduces a novel feedforward graph architecture using frozen large language models as nodes, enabling efficient multi-model communication and improved performance on various benchmarks.
Contribution
It extends geometric compatibility of LLM latent spaces to trainable multi-node graphs with minimal trainable parameters, achieving state-of-the-art results.
Findings
Achieves 87.3% on ARC-Challenge, outperforming single models.
Outperforms parameter-matched classifiers on frozen models.
Gradient flow through frozen models is empirically verified.
Abstract
We present a feedforward graph architecture in which heterogeneous frozen large language models serve as computational nodes, communicating through a shared continuous latent space via learned linear projections. Building on recent work demonstrating geometric compatibility between independently trained LLM latent spaces~\cite{armstrong2026thinking}, we extend this finding from static two-model steering to end-to-end trainable multi-node graphs, where projection matrices are optimized jointly via backpropagation through residual stream injection hooks. Three small frozen models (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) encode the input into a shared latent space whose aggregate signal is injected into two larger frozen models (Phi-3-mini, Mistral-7B), whose representations feed a lightweight cross-attention output node. With only 17.6M trainable parameters against approximately 12B…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
