Bifocal Attention: Harmonizing Geometric and Spectral Positional Embeddings for Algorithmic Generalization
Kanishk Awadhiya

TL;DR
This paper introduces Bifocal Attention, a new approach combining geometric and spectral positional embeddings to improve algorithmic reasoning and generalization in language models.
Contribution
It proposes a novel architectural paradigm and training protocol that decouple and evolve positional encodings for better long-range recursive reasoning.
Findings
Enhanced extrapolation to deeper recursive steps
Improved performance on algorithmic reasoning tasks
Spectral Evolution enables adaptive harmonic basis learning
Abstract
Rotary Positional Embeddings (RoPE) have become the standard for Large Language Models (LLMs) due to their ability to encode relative positions through geometric rotation. However, we identify a significant limitation we term ''Spectral Rigidity'': standard RoPE utilizes a fixed geometric decay () optimized for local syntactic coherence, which fails to capture the long-range, periodic structures inherent in recursive logic and algorithmic reasoning. This results in a ''Structure Gap'', where models trained on shallow reasoning chains fail to extrapolate to deeper recursive steps. In this work, we introduce Bifocal Attention, an architectural paradigm that decouples positional encoding into two distinct modalities: Geometric Eyes (Standard RoPE) for precise token-level manipulation, and Spectral Eyes (Learnable Harmonic Operators) for tracking long-range recursive depth. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization
