Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection

Hema Karnam Surendrababu (1); Nithin Nagaraj (1) ((1) National Institute of Advanced Studies; Indian Institute of Science Campus; Bengaluru; India)

arXiv:2602.06532·cs.CR·February 9, 2026

Dependable Artificial Intelligence with Reliability and Security (DAIReS): A Unified Syndrome Decoding Approach for Hallucination and Backdoor Trigger Detection

Hema Karnam Surendrababu (1), Nithin Nagaraj (1) ((1) National Institute of Advanced Studies, Indian Institute of Science Campus, Bengaluru, India)

PDF

Open Access

TL;DR

This paper introduces DAIReS, a unified syndrome decoding method that detects both backdoor triggers and hallucinations in large language models, enhancing system security and reliability.

Contribution

The work adapts syndrome decoding to NLP, providing a novel unified approach for detecting security and reliability issues in ML models.

Findings

01

Effective detection of poisoned samples in training data.

02

Identification of hallucinated content in LLMs.

03

Unified approach applicable to security and reliability violations.

Abstract

Machine Learning (ML) models, including Large Language Models (LLMs), are characterized by a range of system-level attributes such as security and reliability. Recent studies have demonstrated that ML models are vulnerable to multiple forms of security violations, among which backdoor data-poisoning attacks represent a particularly insidious threat, enabling unauthorized model behavior and systematic misclassification. In parallel, deficiencies in model reliability can manifest as hallucinations in LLMs, leading to unpredictable outputs and substantial risks for end users. In this work on Dependable Artificial Intelligence with Reliability and Security (DAIReS), we propose a novel unified approach based on Syndrome Decoding for the detection of both security and reliability violations in learning-based systems. Specifically, we adapt the syndrome decoding approach to the NLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Malware Detection Techniques