Topology-Enhanced Alignment for Large Language Models: Trajectory Topology Loss and Topological Preference Optimization
Yurui Pan, Ke Xu, Bo Peng

TL;DR
This paper introduces a topology-aware framework for aligning large language models by regularizing semantic trajectories with persistent homology, improving alignment quality over traditional methods.
Contribution
It proposes Trajectory Topology Loss and Topological Preference Optimization, novel methods leveraging topological features to enhance LLM alignment during fine-tuning.
Findings
Topology-enhanced objectives outperform non-topological baselines on preference metrics.
Persistent homology captures meaningful semantic bridges in model trajectories.
Methods maintain or improve toxicity levels while enhancing alignment quality.
Abstract
Alignment of large language models (LLMs) via SFT and RLHF/DPO typically ignores the global geometry of the representation space, relying instead on local token likelihoods or scalar scores. We view generation as tracing a semantic trajectory in hidden space and propose a topology-enhanced alignment framework that regularizes these trajectories using 0-dimensional persistent homology. First, for SFT, we introduce Trajectory Topology Loss (TTL). Treating prompt and gold-answer embeddings as a mixed point cloud, we use a 0D persistent homology algorithm to extract "prompt-answer bridges." TTL aligns the model's actual update direction with these topological bridges rather than arbitrary directions. Second, for DPO, we propose Topological Preference Optimization (TPO). TPO constructs topic-specific semantic preference vectors and aligns the improvement direction between rejected and chosen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
