Fibration Policy Optimization
Chang Li, Tshihao Tsu, Yaren Zhang, Chao Xue, Xiaodong He

TL;DR
This paper introduces a novel hierarchical policy optimization framework for large language models, integrating trust-region methods with algebraic fiber bundle structures to improve stability and efficiency across multiple scales.
Contribution
It presents the first exact reformulation of sample-based TV-TRPO, develops Fiber Bundle Gating for hierarchical data organization, and introduces Fibration Policy Optimization with scalable hierarchical gating.
Findings
FiberPO improves token efficiency in policy updates.
Fibration Gating Hierarchy scales to arbitrary depth.
The framework unifies trust-region theory with multi-scale stability control.
Abstract
Large language models are increasingly trained as heterogeneous systems spanning multiple domains, expert partitions, and agentic pipelines, yet prevalent proximal objectives operate at a single scale and lack a principled mechanism for coupling token-level, trajectory-level, and higher-level hierarchical stability control. To bridge this gap, we derive the Aggregational Policy Censoring Objective (APC-Obj), the first exact unconstrained reformulation of sample-based TV-TRPO, establishing that clipping-based surrogate design and trust-region optimization are dual formulations of the same problem. Building on this foundation, we develop Fiber Bundle Gating (FBG), an algebraic framework that organizes sampled RL data as a fiber bundle and decomposes ratio gating into a base-level gate on trajectory aggregates and a fiber-level gate on per-token residuals, with provable first-order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
