# Agentic RAG for Maritime AIoT: Natural Language Access to Structured Data

**Authors:** Oxana Sachenkova, Melker Andreasson, Dongzhu Tan, Alisa Lincke

PMC · DOI: 10.3390/s26041227 · 2026-02-13

## TL;DR

This paper introduces Lighthouse Bot, a secure AI system for maritime operations that allows natural language access to sensor data while ensuring privacy and compliance.

## Contribution

The paper introduces Lighthouse Bot, an agentic RAG system for maritime AIoT with verifiable data access and policy-aligned tool use.

## Key findings

- Lighthouse Bot enables natural language access to complex maritime sensor data with auditable and secure operations.
- Claude 3.7 achieved 90% factual correctness, while Qwen 72B reached 66% overall and 99% on simple queries.
- The system supports generating Python code and executing SQL queries for time-series and relational data.

## Abstract

Maritime operations are increasingly reliant on sensor data to drive efficiency and enhance decision-making. However, despite rapid advances in large language models, including expanded context windows and stronger generative capabilities, critical industrial settings still require secure, role-constrained access to enterprise data and explicit limitation of model context. Retrieval-Augmented Generation (RAG) remains essential to enforce data minimization, preserve privacy, support verifiability, and meet regulatory obligations by retrieving only permissioned, provenance-tracked slices of information at query time. However, current RAG solutions lack robust validation protocols for numerical accuracy for high-stakes industrial applications. This paper introduces Lighthouse Bot, a novel Agentic RAG system specifically designed to provide natural-language access to complex maritime sensor data, including time-series and relational sensor data. The system addresses a critical need for verifiable autonomous data analysis within the Artificial Intelligence of Things (AIoT) domain, which we explore through a case study on optimizing ferry operations. We present a detailed architecture that integrates a Large Language Model with a specialized database and coding agents to transform natural language into executable tasks, enabling core AIoT capabilities such as generating Python code for time-series analysis, executing complex SQL queries on relational sensor databases, and automating workflows, while keeping sensitive data outside the prompt and ensuring auditable, policy-aligned tool use. To evaluate performance, we designed a test suite of 24 questions with ground-truth answers, categorized by query complexity (simple, moderate, complex) and data interaction type (retrieval, aggregation, analysis). Our results show robust, controlled data access with high factual fidelity: the proprietary Claude 3.7 achieved close to 90% overall factual correctness, while the open-source Qwen 72B achieved 66% overall and 99% on simple retrieval and aggregation queries. These findings underscore the need for a secure limited-context RAG in maritime AIoT and the potential for cost-effective automation of routine exploratory analyses.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), digit hallucination (MESH:D006212)
- **Chemicals:** ARAG (-)
- **Species:** Lama glama (llama, species) [taxon 9844], Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12943969/full.md

---
Source: https://tomesphere.com/paper/PMC12943969