Control Barrier Function for Aligning Large Language Models

Yuya Miyaoka; Masaki Inoue

arXiv:2511.03121·cs.CL·March 31, 2026

Control Barrier Function for Aligning Large Language Models

Yuya Miyaoka, Masaki Inoue

PDF

TL;DR

This paper introduces a control barrier function-based safety filter for aligning large language models, enabling safe text generation without modifying the original models.

Contribution

It presents a novel, add-on safety filter framework using control barrier functions to improve LLM alignment without fine-tuning.

Findings

01

The safety filter can be applied without fine-tuning the baseline LLM.

02

It can incorporate existing evaluation models for alignment.

03

The framework is implemented with open-source language models.

Abstract

This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the CBF safety filter to the predicted token generated from the baseline LLM, to intervene in the generated text. The safety filter includes two significant advantages: this safety filter is an add-on type, allowing it to be used for alignment purposes without fine-tuning the baseline LLM, and if there is an evaluation model regarding the desired alignment, it can be directly applied to the filter design. The overall text-generation system is implemented with open-source language models, aiming to generate positive text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.