LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers

Changshun Wu; Tianyi Duan; Saddek Bensalem; Chih-Hong Cheng

arXiv:2506.00998·cs.LG·June 3, 2025

LoRA-BAM: Input Filtering for Fine-tuned LLMs via Boxed Abstraction Monitors over LoRA Layers

Changshun Wu, Tianyi Duan, Saddek Bensalem, Chih-Hong Cheng

PDF

Open Access

TL;DR

LoRA-BAM enhances fine-tuned LLM reliability by adding interpretable OoD detection monitors over LoRA layers, using feature clustering and regularization to filter out questions beyond the model's competence.

Contribution

This paper introduces LoRA-BAM, a novel approach that integrates boxed abstraction monitors into LoRA layers for effective OoD detection and improved interpretability during fine-tuning.

Findings

01

Effective OoD detection with boxed abstraction monitors.

02

Improved robustness through regularization during fine-tuning.

03

Lightweight and interpretable method for filtering out-of-distribution queries.

Abstract

Fine-tuning large language models (LLMs) improves performance on domain-specific tasks but can lead to overfitting, making them unreliable on out-of-distribution (OoD) queries. We propose LoRA-BAM - a method that adds OoD detection monitors to the LoRA layer using boxed abstraction to filter questions beyond the model's competence. Feature vectors from the fine-tuning data are extracted via the LLM and clustered. Clusters are enclosed in boxes; a question is flagged as OoD if its feature vector falls outside all boxes. To improve interpretability and robustness, we introduce a regularization loss during fine-tuning that encourages paraphrased questions to stay close in the feature space, and the enlargement of the decision boundary is based on the feature variance within a cluster. Our method complements existing defenses by providing lightweight and interpretable OoD detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Wireless Communication Techniques · Advanced Adaptive Filtering Techniques · Error Correcting Code Techniques