LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling
Keqin Xie

TL;DR
LPC-SM introduces a hybrid autoregressive architecture that separates local attention, persistent memory, and predictive correction, demonstrating improved long-context language modeling over traditional attention-based models.
Contribution
The paper presents LPC-SM, a novel architecture that decomposes sequence modeling into distinct components, enabling more effective long-context language modeling.
Findings
Removing mHC increases language model loss from 12.630 to 15.127.
Adaptive sparse control reduces loss from 12.137 to 10.787.
Model maintains stability at sequence length 4096 with improved diagnostic scores.
Abstract
Most current long-context language models still rely on attention to handle both local interaction and long-range state, which leaves relatively little room to test alternative decompositions of sequence modeling. We propose LPC-SM, a hybrid autoregressive architecture that separates local attention, persistent memory, predictive correction, and run-time control within the same block, and we use Orthogonal Novelty Transport (ONT) to govern slow-memory writes. We evaluate a 158M-parameter model in three stages spanning base language modeling, mathematical continuation, and 4096-token continuation. Removing mHC raises the Stage-A final LM loss from 12.630 to 15.127, while adaptive sparse control improves the Stage-B final LM loss from 12.137 to 10.787 relative to a matched fixed-ratio continuation. The full route remains stable at sequence length 4096, where Stage C ends with final LM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
