Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models

Jiasen Zheng; Huajun Zhang; Xu Yan; Ran Hao; Chong Peng

arXiv:2510.27077·cs.CL·November 3, 2025

Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models

Jiasen Zheng, Huajun Zhang, Xu Yan, Ran Hao, Chong Peng

PDF

Open Access

TL;DR

This paper introduces a novel fine-tuning approach combining contrastive distillation and noise-robust training to enhance the safety, robustness, and alignment accuracy of large language models, validated through comprehensive experiments.

Contribution

It presents a new framework that integrates knowledge transfer with robustness constraints, improving safety and reliability of large language models.

Findings

01

Outperforms existing methods in knowledge transfer and robustness

02

Maintains stable outputs under noisy and uncertain inputs

03

Achieves top performance on key safety and alignment metrics

Abstract

This paper addresses the limitations of large-scale language models in safety alignment and robustness by proposing a fine-tuning method that combines contrastive distillation with noise-robust training. The method freezes the backbone model and transfers the knowledge boundaries of the teacher model to the student model through distillation, thereby improving semantic consistency and alignment accuracy. At the same time, noise perturbations and robust optimization constraints are introduced during training to ensure that the model maintains stable predictive outputs under noisy and uncertain inputs. The overall framework consists of distillation loss, robustness loss, and a regularization term, forming a unified optimization objective that balances alignment ability with resistance to interference. To systematically validate its effectiveness, the study designs experiments from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks